Uni3D: Single Video-based Dynamic 3D Modeling for Humans and Animals with a Unified Template

Xiangyue Liu¹ Qingtian Zhu² Weixin Xu² Shuchang Zhou²

Li Yi¹ Yang Gao¹

¹Tsinghua University ²MEGVII Research

**Uni3D** robustly reconstructs dynamic 3D models of humans and animals from single short casual video frames. **Left**: Input an individual video of an object. **Middle**: A unified coarse shape, 3D shape in canonical space with skinning weight as surface color. **Right**: Unconstrained posed reconstruction at each time instance and different views.

Abstract

Prior work for recovering realistic 3D models for humans and animals from a single video needs either a category-specific shape template or initializing shape as a unit sphere. Such methods do not applicable to diverse kinds of categories or can not handle fast object motions well. Our method, called Uni3D, goes beyond current work in several important ways. First, we introduce a unified shape to be more appropriate for representing humans and animals. But, even with a better shape model, the problem of reconstructing dynamic models from a single video is still challenging because fast object motions commonly exist in casually captured videos. To address this, we propose a novel feature blend skinning deformation model that leverages long-range dense correspondence feature matching information. Evaluations on real and synthetic datasets validate that Uni3D achieves state-of-the-art 3D shape reconstruction performance regarding both geometry and texture quality. Notably, Uni3D takes a step toward general high-fidelity human and animal model reconstruction from a single casual video.

[Code] [Paper]

Results

Swan-snow (10s)

Left: reference video (10s, 117 frames). Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Dog-pidun (7s)

Left: reference video (7s, 35 frames). Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Human-cap (9s)

Left: reference video (9s, 100 frames). Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Results on Synthetic Datasets

Eagle (30 frames)

Left: reference video. Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Q-bouncing (35 frames)

Left: reference video. Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Dog (19 frames)

Left: reference video. Middle: reconstructed nonrigid 3D shape. Right: reconstructed textured mesh.

Uni3D: Single Video-based Dynamic 3D Modeling for Humans and Animals with a Unified Template

Abstract

Results

Swan-snow (10s)

Dog-pidun (7s)

Human-cap (9s)

Results on Synthetic Datasets

Eagle (30 frames)

Q-bouncing (35 frames)

Dog (19 frames)

Comparisons

Dog-pidun (7s)

Q_bouncing (35 frames)