Vision-based Long Document Information Retrieval
- Forschungsthema/Bereich
- Document Analysis, Information Retrieval, Artifical Intelligence, Computer Vision, Computer Science, Deep Learning, Large Language Model
- Typ der Abschlussarbeit
- Master
- Startzeitpunkt
- -
- Bewerbungsschluss
- 31.05.2026
- Dauer der Arbeit
- -
Beschreibung
Long Document Information Retrieval (LDIR) refers to the task of finding relevant information within lengthy documents that contain rich visual and textual content. Unlike traditional IR on plain text, vision-based LDIR must handle the original look of documents – their layout and visual elements – to retrieve meaningful results. were absent or underrepresented in training.In this thesis, we will research advanced retrieval techniques that integrate OCR-enhanced text extraction, multimodal embeddings, and hierarchical document retrieval. We aim to bridge the gap between textual and visual information by leveraging state-of-the-art models to understand and align content across different modalities for Long Document Information Retrieval.What you do:● Literature research on vision-based LDIR.
● Implementation of state-of-the-art methods for vision-based LDIR tasks.
● (Optional) Integrating multimodal retrieval methods to improve document understanding.What we offer:
● Getting started quickly with our open-source code
● Compute resources for model training and deployment
● Experienced guidance and open discussions with other team members
● Support publishing your work at top conferences (also attending conferences in person)Further Information:
We have further topics, such as Computer Vision, large language models (LLMs), Generative Models, Retrieval-Augmented Generation (RAG), Document Analysis and understanding, etc. Please feel free to contact me (yufan.chen@kit.edu) with your CV and transcript of your records.
Voraussetzung
- Voraussetzungen an Studierende
-
- Interest in the topic of computer vision and doing task-oriented research
- Python programming skills and knowledge of PyTorch/Tensorflow are desirable
- Studiengangsbereiche
-
- Ingenieurwissenschaften
Elektrotechnik & Informationstechnik
Geodäsie & Geoinformatik
Informatik
Mechatronik & Informationstechnik
Sonstige Studienbereiche
Remote Sensing and Geoinformatics
Information System Engineering and Management
- Ingenieurwissenschaften
Betreuung
- Titel, Vorname, Name
- M.Sc., Yufan, Chen
- Organisationseinheit
- Computer Vision for Human-Computer Interaction Lab, Institute for Anthropomatics and Robotics (IAR)
- E-Mail Adresse
- yufan.chen@kit.edu
- Link zur eigenen Homepage/Personenseite
- Website
Bewerbung per E-Mail
- Bewerbungsunterlagen
-
- Lebenslauf
- Notenauszug
E-Mail Adresse für die Bewerbung
Senden Sie die oben genannten Bewerbungsunterlagen bitte per Mail an yufan.chen@kit.edu
Zurück