Any potential solution to improve the inference speed and quality if I have depth information and data from previous frames?

Open RayShing opened this issue 1 year ago • 0 comments

Thank you for your outstanding work!

I understand that the current solution operates on a single-frame basis with 2D input, similar to GeneFace++. While we have a video-driven solution, it appears that the inference remains single-frame basis.

I am exploring the application of these audio-to-face solutions within a 3D video streaming system, utilizing depth sensors to capture data. With depth information and data from previous frames, I believe it is possible to accelerate inference and enhance reconstruction quality.

I would appreciate any insights or advice on this approach. Thank you!

Jul 08 '24 04:07 RayShing