Connecting Language to Vision

Organization (SoSe 25)

Announcements
Moodle
Format
Master seminar, english
Lecturer
Prof. Dr. Thomas Seidl
Assistants
Tanveer Hannan
Timeline
14.04.2025: Introduction, Team Formation, Topic Distribution
05.05.2025:
Intermediate Progress and Q/A
19.05.2025:
Final Presentation
26.05.2025: Report Submission

Only SoSe 2025

Overview

This seminar explores the intersection of computer vision and NLP, covering AI agents, video captioning, retrieval, QA, and query-based object/action localization. It also examines large reasoning language models for video understanding.

Goal

Students will gain expertise in Vision-Language Modeling (VLM) research, covering problem formulation, literature review, model development, experimental design, and evaluation. The course provides insights into emerging research areas and thesis opportunities.

Format

  • Block seminar: Mandatory attendance at all sessions.
  • Key meetings: Kick-off, final presentation, and two additional sessions.

Prerequisites

  • Machine Learning / Deep Learning
  • Computer Vision and/or NLP with Deep Learning
  • Python (PyTorch) & Linux

Students not meeting these requirements must receive explicit permission from the instructor to remain in this course.