High-Performance Data Processing for Large-Scale Scientific Experiments
- Research topic/area
- Big Data and Data Engineering
- Type of thesis
- Bachelor / Master
- Start time
- 01.07.2025
- Application deadline
- 31.05.2025
- Duration of the thesis
- 6 months
Description
Modern scientific experiments, such as the KARA synchrotron radiation facility and the KATRIN experiment, generate massive, high-dimensional datasets that challenge current data processing frameworks. Efficient storage, retrieval, and analysis of this data are crucial for scientific discovery but remain a significant bottleneck. This project aims to evaluate and develop novel approaches for high-performance data processing by leveraging advanced compression techniques, optimized database architectures, and system-level enhancements. The focus is on improving scalability and efficiency for multi-dimensional time-series data and 3D volumetric datasets, addressing key challenges such as reducing storage overhead, accelerating retrieval times, and enhancing performance in large-scale scientific workflows.Depending on the student's background and interests, the project may focus on one or more of the following:● Database Optimization: Investigating high-performance storage solutions such as TileDB and ClickHouse to improve query execution and scalability.
● High-Performance Computing (HPC): Exploring parallel processing techniques, pipeline-based workflows, and optimizations for Linux-based clusters to enhance computational efficiency.
● Performance Benchmarking: Extending and refining our in-house tool (SciTS) to systematically assess ingestion throughput, query latency, and scalability in large-scale scientific datasets.
● AI-Based Techniques: Investigating machine learning approaches for adaptive compression and intelligent query optimization.
Requirement
- Requirements for students
-
- ● Experience with C, Python, or C# (modular software development).
- ● Familiarity with database architectures (relational or novel systems like columnar/array-based storage).
- ● Experience with Linux-based environments and system-level optimization. Cloud (Kubernetes) experience is a plus.
- Faculty departments
-
- Engineering sciences
Electrical engineering & information technologies
Informatics
Information System Engineering and Management
- Engineering sciences
Supervision
- Title, first name, last name
- Dr.-Ing. Nicholas Tan Jerome
- Organizational unit
- Institute for Data Processing and Electronics
- Email address
- nicholas.tanjerome@kit.edu
- Link to personal homepage/personal page
- Website
Application via email
- Application documents
-
- Cover letter
- Curriculum vitae
- Grade transcript
- Certificate of enrollment
E-Mail Address for application
Senden Sie die oben genannten Bewerbungsunterlagen bitte per Mail an nicholas.tanjerome@kit.edu
Back