I am a Research Scientist at Meta Reality Labs. My main research interests include dynamic scene reconstruction, generative models, and photorealistic rendering.
My research interests are in 3D reconstruction of objects and scenes, video understanding, realistic rendering, object detection, tracking, 6D pose estimation. In particular, I focus on test-time optimization methods to solve those problems.
We introduce an approach that takes advantage of generative models to correct errors and complete missing information in a Gaussian-based dynamic scene representation.
We introduce a recipe for generating immersive 3D worlds from a single image by framing the task as an in-context learning problem for 2D inpainting models.
Deblur-SLAM can successfully track the camera and reconstruct sharp maps for highly motion-blurred sequences. We directly model motion blur, which enables us to achieve high-quality reconstructions, both on challenging synthetic and real data.
We asked 3D modelers to rank wireframe reconstructions and compared it to the ranking by metrics. After I exploited issues in CVPR'24 challenge S23DR and won the 1st place, we came up with a better solution for the next challenge.
We leverage the available depth sensing signal on XR devices combined with self-supervision to learn a multi-modal pose estimation model capable of tracking full body motions in real time.
We propose a novel method to produce generic 3D room layouts just from 2D segmentation masks, with which we annotate and publicly release 2246 3D room layouts on the RealEstate10k dataset.
We propose DeFMO that given a single image with its estimated background outputs the object's appearance and position in a series of sub-frames as if captured by a high-speed camera (i.e. temporal super-resolution). This is the first deep-learning based approach for FMO deblurring.
We extend TbD pipeline to track fast moving objects in full 6 DoF, estimating simultaneously their 3D motion trajectory, 3D pose and object appearance changes with a time step that is a fraction of the video frame exposure time.
Our method learns sensor or algorithm properties jointly with semantic depth fusion and scene completion and can also be used as an expert system, eg to unify the strengths of various photometric stereo algorithms.
Introducing fast moving objects for the first time as objects that move over distances larger than their size in one video frame: new problem, new dataset, new metrics, new baseline.
Computer Vision: Teaching Assistant for Autumn Semester 2019, 2020.
Mixed Reality Lab: Teaching Assistant for Autumn Semester 2021.
3D Vision: Teaching Assistant for Spring Semester 2020, 2021, 2022.
Deep Learning Seminar: Teaching Assistant for Spring Semester 2020 - 2024.