High-Performance Data Processing for Large-Scale Scientific Experiments

Research topic/area
Big Data and Data Engineering
Type of thesis
Bachelor / Master
Start time
01.07.2025
Application deadline
31.05.2025
Duration of the thesis
6 months

Description

Modern scientific experiments, such as the KARA synchrotron radiation facility and the KATRIN experiment, generate massive, high-dimensional datasets that challenge current data processing frameworks. Efficient storage, retrieval, and analysis of this data are crucial for scientific discovery but remain a significant bottleneck. This project aims to evaluate and develop novel approaches for high-performance data processing by leveraging advanced compression techniques, optimized database architectures, and system-level enhancements. The focus is on improving scalability and efficiency for multi-dimensional time-series data and 3D volumetric datasets, addressing key challenges such as reducing storage overhead, accelerating retrieval times, and enhancing performance in large-scale scientific workflows.

Depending on the student's background and interests, the project may focus on one or more of the following:
● Database Optimization: Investigating high-performance storage solutions such as TileDB and ClickHouse to improve query execution and scalability.
● High-Performance Computing (HPC): Exploring parallel processing techniques, pipeline-based workflows, and optimizations for Linux-based clusters to enhance computational efficiency.
● Performance Benchmarking: Extending and refining our in-house tool (SciTS) to systematically assess ingestion throughput, query latency, and scalability in large-scale scientific datasets.
● AI-Based Techniques: Investigating machine learning approaches for adaptive compression and intelligent query optimization.

Requirement

Requirements for students
  • ● Experience with C, Python, or C# (modular software development).
  • ● Familiarity with database architectures (relational or novel systems like columnar/array-based storage).
  • ● Experience with Linux-based environments and system-level optimization. Cloud (Kubernetes) experience is a plus.

Faculty departments
  • Engineering sciences
    Electrical engineering & information technologies
    Informatics
    Information System Engineering and Management


Supervision

Title, first name, last name
Dr.-Ing. Nicholas Tan Jerome
Organizational unit
Institute for Data Processing and Electronics
Email address
nicholas.tanjerome@kit.edu
Link to personal homepage/personal page
Website

Application via email

Application documents
  • Cover letter
  • Curriculum vitae
  • Grade transcript
  • Certificate of enrollment

E-Mail Address for application
Senden Sie die oben genannten Bewerbungsunterlagen bitte per Mail an nicholas.tanjerome@kit.edu


Back