youngstu
youngstu
Using 3D joint and human-computer interaction control, such as Nreal AR glasses. https://www.youtube.com/watch?v=9LxOlsHu3r8&ab_channel=UploadVR If there is no absolute depth, it is impossible to judge whether the button has been touched.
If faceformer supports arkit-blendshape-coefficient output, it can support other 3D templates.
DanceRevolution is superior to AI Chreographer in Chinese songs. AI Chreographer is tested, and the effect is poor in my collected audios such as Chinese songs. Maybe AIST++ data may...
There are some tools for video inference in human body pose and shape estimation. https://github.com/mkocabas/VIBE/tree/master https://github.com/Arthur151/ROMP
> @youngstu Thank you for your interest in our work. Yes, we have the original video data but there might be some legal or copyright issues if we publicly release...
Using 3D joint and human-computer interaction control, such as Nreal AR glasses. https://www.youtube.com/watch?v=9LxOlsHu3r8&ab_channel=UploadVR If there is no absolute depth, it is impossible to judge whether the button has been touched.
Do long sentences need to split and synthesize separately?
I have the same problem. Maybe there needs joint modeling for Chinese and English.
Have you verified other arkit-52 specification models, such as arkit and Metahuman 3d face model?
1. I render with pytorch3d in the order of exp_ name_list and my arkit standard model are averaged and summed. 2. The mouth effect is that the mouth is very...