3D Scene Understanding at CVPR 2025

The developments in AI technology have spurred calls for next-generation AI, e.g., Embodied AI and General AI, which enables systems to physically interact with their environments for comprehensive tasks in a human-like manner. Towards this goal, researchers from diverse fields, e.g., computer vision, computer graphics, and robotics, have made separate efforts and made progress across various topics, including 3D representation (e.g., NeRF, Gaussian Splatting), foundation models (e.g., SAM(2), Stable (Video) Diffusion), datasets (e.g., Objaverse (XL), Open X-Embodiment), and end-to-end vision-language-action (VLA) models (e.g., RT-X), etc.

However, new fundamental questions arise about how to sustain a more comprehensive understanding of the environment, unite these efforts, and facilitate the future development of next-generation AI. For example, what is the role of traditional scene parsing/detection/localization in today’s development? How to leverage scene understanding techniques to improve the physical interaction? Could pure end-to-end models and scaling large-scale datasets work, or are intermediate representations, even symbolic ones more suitable for certain tasks?

This year’s focus will be exploring the fundamental aspects to enhance interaction between agents and 3D scenes in the new era of AI, promoting future directions and ideas to emerge within the next two to five years.

08:15 am - 08:30 am Opening Remark and Introduction
08:30 pm - 09:00 pm Invited talk: Guanya Shi (CMU) [video]
09:00 am - 09:30 am Invited Talk: Deva Ramanan (CMU)
09:30 am - 10:00 am Invited Talk: Angel X. Chang (SFU) [video]
10:00 am - 10:30 am Invited Talk: Carl Vondrick (Columbia) [video]
10:30 am - 11:00 am Invited Talk: Daniel Cremers (TUM) [video]
11:00 pm - 11:15 pm Coffee Break
11:15 am - 11:45 am Invited Talk: Iro Armeni (Stanford) [video]
11:45 am - 12:15 pm Invited talk: Kiana Ehsani (Vercept)


Yixin Chen (BIGAI)	Baoxiong Jia (BIGAI)	Yao Feng (Stanford)	Songyou Peng (DeepMind)

Chuhang Zou (Reality Lab)	Sai Kumar Dwivedi (MPI)	Yixin Zhu (PKU)	Siyuan Huang (BIGAI)


Baoxiong Jia (BIGAI)	Xiongkun Linghu (BIGAI)	Tai Wang (Shanghai AI Lab)	Jingli Lin (SJTU)	Xiaojian Ma (BIGAI)

5th 3D Scene Understanding for Vision, Graphics, and Robotics

CVPR 2025 Workshop, Nashville TN, June 11th Morning, 2025

Watch the video recordings from virtual CVPR or Youtube

Overview

Invited Speakers

Schedule

Organizers

Challenge Organizers

Senior Organizers


Deva Ramanan (CMU)	Angel X. Chang (SFU)	Carl Vondrick (Columbia)	Daniel Cremers (TUM)

Iro Armeni (Stanford)	Kiana Ehsani (Vercept)	Guanya Shi (CMU)


Marc Pollefeys (ETH Zurich)	Derek Hoiem (UIUC)	Song-Chun Zhu (BIGAI, PKU, THU)