3D Scene Understanding for Vision, Graphics, and Robotics

CVPR 2020 Workshop, Virtual, June 15th, 2020

Watch the recorded video workshop from Youtube


Due to the pandemic, our workshop will be virtual this year. We will host an online chat room for communication with the speakers and Q&A. Looking forward to meet you online!.

Invited talks and oral presentations will be presented live or by recorded videos in the same Zoom room, all of the talks will have live Q&A session, please refer to the Talks for recorded videos and more details.

All the events are hosted in the Zoom, click the raise hand button if you have questions during the talk. The speaker would either pause to answer your questions or leave them to the Q&A part.

Invited Speakers

Kristen Grauman (UT Austin) Sergey Levine (UC Berkeley) Andreas Geiger (University of Tübingen)
Yasutaka Furukawa (Simon Fraser University) Daniel Ritchie (Brown University) Jeannette Bohg (Stanford University)
Shuran Song (Columbia University) Andrea Tagliasacchi (Google Brain) Katerina Fragkiadaki (Carnegie Mellon University)

Opening Remark

David Forsyth (University of Illinois Urbana-Champaign)


Oral Presentation


The goal of this workshop is to foster interdisciplinary communication of researchers working on 3D scene understanding (computer vision, computer graphics, and robotics) so that more attention of the broader community can be drawn to this field. Through this workshop, current progress and future directions will be discussed, and new ideas and discoveries in related fields are expected to emerge.

Specifically, we are interested in the following problems:

  • Datasets: What is a desired yet manageable breadth for a dataset to serve various tasks at the same time and provide ample opportunities to combine problems?
  • Representations: What are representations most suitable for a particular task like reconstruction, physical reasoning, etc.? Can a single representation serve all purposes of 3D scene understanding?
  • Reconstruction: How to build efficient models which parse and reconstruct the observation from different data modalities (RGB, RGBD, Physical Sensor)?
  • Reasoning: How to formulate reasoning about affordances and physical properties? How to encode, represent and learn common sense?
  • Interaction: How to model and learn the physical interaction with objects within the scene?
  • Bridge of the three fields: How to facilitate research to connect among vision, graphics, and robotics via 3D scene understanding?


Siyuan Huang* (UCLA) Chuhang Zou* (UIUC) Hao Su (UCSD) Alexander Schwing (UIUC)
Shuran Song (Columbia) Jiajun Wu (Stanford) Siyuan Qi (UCLA) Yixin Zhu (UCLA)

Senior Organizers

David Forsyth (UIUC) Derek Hoiem (UIUC) Leonidas Guibas (Stanford) Song-Chun Zhu (UCLA)