About Me
I am a Ph.D. student at UC Berkeley, affiliated with Berkeley AI Research (BAIR) Lab and Berkeley DeepDrive (BDD).
My research focuses on 3D computer vision and its applications in Robotics.
Research Interests
Generation
3D/4D Generation
World Models
Reconstruction
Gaussian Splatting
NeRF-SLAM
Perception
Sensor Fusion
Multi-modal Learning
Publications
Selected research papers and preprints. * denotes equal contribution.
UniQueR: Unified Query-based Feedforward 3D Reconstruction
We present UniQueR, a unified query-based feedforward framework for 3D reconstruction from unposed images. A compact set of 3D anchor queries spawns Gaussians for differentiable rendering, enabling inference of scene structure—including occluded geometry—in a single forward pass.
S2GO: Streaming Sparse Gaussian Occupancy
We formulate the scene into a compact set of 3D queries which are propagated through time in an online, streaming fashion. These queries are then decoded into semantic Gaussians at each timestep. We couple our framework with a denoising rendering objective to guide the queries and their constituent Gaussians in effectively capturing scene geometry.
RAYNOVA: Scale-Temporal Autoregressive World Modeling in Ray Space
We introduce RayNova a unified 4D latent world model that jointly learns Continuous Spatio-Temporal representations with minimal inductive bias, enabling seamless, spatio-temporally coherent video generation.
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images. Our approach further incorporates a guidance mechanism to aggregate information from multiple views.
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition
We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios.
Q-SLAM: Quadric Representations for Monocular SLAM
In this study, we propose a novel approach that reimagines volumetric representations through the lens of quadric forms. We posit that most scene components can be effectively represented as quadric planes. Leveraging this assumption, we reshape the volumetric representations with million of cubes by several quadric planes, which leads to more accurate and efficient modeling of 3D scenes in SLAM contexts.
Interactive multi-scale fusion of 2D and 3D features for multi-object vehicle tracking
In this paper, we propose multi-scale interactive query and fusion between pixel-wise and point-wise features to obtain more discriminative features. In addition, an attention mechanism is utilized to conduct soft feature fusion between multiple pixels and points to avoid inaccurate match problems.
PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search
We explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy. We also propose a multi-modal framework to improve the robustness.
Get In Touch
Always open to discussing new ideas, collaborations, or research opportunities.