Connecting Language to Vision

Organization (SoSe 25)

Announcements: Moodle
Format: Master seminar, english
Lecturer: Prof. Dr. Thomas Seidl
Assistants: Tanveer Hannan
Timeline: 14.04.2025: Introduction, Team Formation, Topic Distribution
05.05.2025: Intermediate Progress and Q/A
19.05.2025: Final Presentation
26.05.2025: Report Submission

Only SoSe 2025

Overview

This seminar explores the intersection of computer vision and NLP, covering AI agents, video captioning, retrieval, QA, and query-based object/action localization. It also examines large reasoning language models for video understanding.

Goal

Students will gain expertise in Vision-Language Modeling (VLM) research, covering problem formulation, literature review, model development, experimental design, and evaluation. The course provides insights into emerging research areas and thesis opportunities.

Format

Block seminar: Mandatory attendance at all sessions.
Key meetings: Kick-off, final presentation, and two additional sessions.

Prerequisites

Machine Learning / Deep Learning
Computer Vision and/or NLP with Deep Learning
Python (PyTorch) & Linux

Students not meeting these requirements must receive explicit permission from the instructor to remain in this course.