High-Performance Data Processing for Large-Scale Scientific Experiments
- Forschungsthema/Bereich
- Big Data and Data Engineering
- Typ der Abschlussarbeit
- Bachelor / Master
- Startzeitpunkt
- 01.07.2025
- Bewerbungsschluss
- 31.05.2025
- Dauer der Arbeit
- 6 months
Beschreibung
Modern scientific experiments, such as the KARA synchrotron radiation facility and the KATRIN experiment, generate massive, high-dimensional datasets that challenge current data processing frameworks. Efficient storage, retrieval, and analysis of this data are crucial for scientific discovery but remain a significant bottleneck. This project aims to evaluate and develop novel approaches for high-performance data processing by leveraging advanced compression techniques, optimized database architectures, and system-level enhancements. The focus is on improving scalability and efficiency for multi-dimensional time-series data and 3D volumetric datasets, addressing key challenges such as reducing storage overhead, accelerating retrieval times, and enhancing performance in large-scale scientific workflows.Depending on the student's background and interests, the project may focus on one or more of the following:● Database Optimization: Investigating high-performance storage solutions such as TileDB and ClickHouse to improve query execution and scalability.
● High-Performance Computing (HPC): Exploring parallel processing techniques, pipeline-based workflows, and optimizations for Linux-based clusters to enhance computational efficiency.
● Performance Benchmarking: Extending and refining our in-house tool (SciTS) to systematically assess ingestion throughput, query latency, and scalability in large-scale scientific datasets.
● AI-Based Techniques: Investigating machine learning approaches for adaptive compression and intelligent query optimization.
Voraussetzung
- Voraussetzungen an Studierende
-
- ● Experience with C, Python, or C# (modular software development).
- ● Familiarity with database architectures (relational or novel systems like columnar/array-based storage).
- ● Experience with Linux-based environments and system-level optimization. Cloud (Kubernetes) experience is a plus.
- Studiengangsbereiche
-
- Ingenieurwissenschaften
Elektrotechnik & Informationstechnik
Informatik
Information System Engineering and Management
- Ingenieurwissenschaften
Betreuung
- Titel, Vorname, Name
- Dr.-Ing. Nicholas Tan Jerome
- Organisationseinheit
- Institute for Data Processing and Electronics
- E-Mail Adresse
- nicholas.tanjerome@kit.edu
- Link zur eigenen Homepage/Personenseite
- Website
Bewerbung per E-Mail
- Bewerbungsunterlagen
-
- Anschreiben
- Lebenslauf
- Notenauszug
- Immatrikulationsbescheinigung
E-Mail Adresse für die Bewerbung
Senden Sie die oben genannten Bewerbungsunterlagen bitte per Mail an nicholas.tanjerome@kit.edu
Zurück