Organization (SoSe 25)
- Announcements
- Moodle
- Format
- Master seminar, english
- Lecturer
- Prof. Dr. Thomas Seidl
- Assistants
- Tanveer Hannan
- Timeline
- 14.04.2025: Introduction, Team Formation, Topic Distribution
05.05.2025: Intermediate Progress and Q/A
19.05.2025: Final Presentation
26.05.2025: Report Submission
Only SoSe 2025
Overview
This seminar explores the intersection of computer vision and NLP, covering AI agents, video captioning, retrieval, QA, and query-based object/action localization. It also examines large reasoning language models for video understanding.
Goal
Students will gain expertise in Vision-Language Modeling (VLM) research, covering problem formulation, literature review, model development, experimental design, and evaluation. The course provides insights into emerging research areas and thesis opportunities.
Format
- Block seminar: Mandatory attendance at all sessions.
- Key meetings: Kick-off, final presentation, and two additional sessions.
Prerequisites
- Machine Learning / Deep Learning
- Computer Vision and/or NLP with Deep Learning
- Python (PyTorch) & Linux
Students not meeting these requirements must receive explicit permission from the instructor to remain in this course.