Athena Seminar Series: Zhiwen (“Aaron”) Fan

Nov 20

This event has passed.

Wednesday, November 20, 2024

1:00 pm – 2:00 pm

Presenter: Zhiwen (“Aaron”) Fan is a PhD candidate at UT Austin, advised by Prof. Zhangyang (“Atlas”) Wang.

Recording:

https://duke.zoom.us/rec/share/Oo1HUrA5EoLeR1mOSiGXUibYh3RQjChemLSp35W3hWCDcLfO0frGY5UV1MJu_0mx.525gAGxgt0h-y3co

Title:
Empowering Machines to Understand 3D: Scalable Solutions for Accurate Reconstruction and Reliable Interaction 

Abstract:
Large AI models trained on vast datasets can analyze complex information, and recognize patterns, helping us solve problems that were previously difficult or impossible to tackle. However, while Large Language Models(LLMs) and Vision Language Model(VLMs) excel in interpreting text and single images, they lack the spatial understanding needed to bridge 2D data and the 3D physical world. My research addresses this by first investigating the geometric principles that enable high-quality multi-view perception, followed by developing a self-supervised 3D reconstruction pipeline that directly links the 3D visual world with 2D images. Finally, I will introduce the Large Spatial Model—a large-scale 3D model capable of interpreting geometry, appearance, and semantics in a single forward pass and operating in real time. By integrating spatial awareness in the Large Spatial Model with the general knowledge of LLMs, my ongoing and future works aim to establish robust spatial reasoning and will enable applications that require an understanding of 3D environment.
Applications in 3D tissue visualization and editing, multi-modal simultaneous localization and mapping, human avatar creation, and controllable scene simulation demonstrate how these emerging 3D foundation models, combined with domain-specific innovations, can drive breakthroughs in next-generation computing systems.