Top Paper/Code Results Comparisons

Uni3D: Single Video-based Dynamic 3D Modeling for Humans and Animals with a Unified Template

Xiangyue Liu1 Qingtian Zhu2 Weixin Xu2 Shuchang Zhou2
Li Yi1 Yang Gao1
1Tsinghua University 2MEGVII Research



Uni3D robustly reconstructs dynamic 3D models of humans and animals from single short casual video frames. Left: Input an individual video of an object. Middle: A unified coarse shape, 3D shape in canonical space with skinning weight as surface color. Right: Unconstrained posed reconstruction at each time instance and different views.

Abstract

Prior work for recovering realistic 3D models for humans and animals from a single video needs either a category-specific shape template or initializing shape as a unit sphere. Such methods do not applicable to diverse kinds of categories or can not handle fast object motions well. Our method, called Uni3D, goes beyond current work in several important ways. First, we introduce a unified shape to be more appropriate for representing humans and animals. But, even with a better shape model, the problem of reconstructing dynamic models from a single video is still challenging because fast object motions commonly exist in casually captured videos. To address this, we propose a novel feature blend skinning deformation model that leverages long-range dense correspondence feature matching information. Evaluations on real and synthetic datasets validate that Uni3D achieves state-of-the-art 3D shape reconstruction performance regarding both geometry and texture quality. Notably, Uni3D takes a step toward general high-fidelity human and animal model reconstruction from a single casual video.

[Code] [Paper]

Results

Swan-snow (10s)


Left: reference video (10s, 117 frames). Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Dog-pidun (7s)


Left: reference video (7s, 35 frames). Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Human-cap (9s)


Left: reference video (9s, 100 frames). Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.


Results on Synthetic Datasets

Eagle (30 frames)


Left: reference video. Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Q-bouncing (35 frames)


Left: reference video. Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Dog (19 frames)


Left: reference video. Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Comparisons

Dog-pidun (7s)


         Reference video                           Uni3D                               BANMo                              ViSER

Q_bouncing (35 frames)


         Reference video                           Uni3D                               BANMo                              ViSER

Thanks for the template