Abstract
Prior work for recovering realistic 3D models for humans and animals from a single video needs either a category-specific shape template or initializing shape as a unit sphere.
Such methods do not applicable to diverse kinds of categories or can not handle fast object motions well. Our method, called Uni3D, goes beyond current work in several important ways.
First, we introduce a unified shape to be more appropriate for representing humans and animals.
But, even with a better shape model, the problem of reconstructing dynamic models from a single video is still challenging because fast object motions commonly exist in casually captured videos.
To address this, we propose a novel feature blend skinning deformation model that leverages long-range dense correspondence feature matching information.
Evaluations on real and synthetic datasets validate that Uni3D achieves state-of-the-art 3D shape reconstruction performance regarding both geometry and texture quality.
Notably, Uni3D takes a step toward general high-fidelity human and animal model reconstruction from a single casual video.
[Code]
[Paper]