Real3DPortrait
Real3DPortrait copied to clipboard
Any potential solution to improve the inference speed and quality if I have depth information and data from previous frames?
Thank you for your outstanding work!
I understand that the current solution operates on a single-frame basis with 2D input, similar to GeneFace++. While we have a video-driven solution, it appears that the inference remains single-frame basis.
I am exploring the application of these audio-to-face solutions within a 3D video streaming system, utilizing depth sensors to capture data. With depth information and data from previous frames, I believe it is possible to accelerate inference and enhance reconstruction quality.
I would appreciate any insights or advice on this approach. Thank you!