audio2face_mm2023
audio2face_mm2023 copied to clipboard
文章里写的推理时间0.007s是指HuBERT 和 ResNet1D总的时间吗?
文中:Our backbone is built on a pretrained HuBERT model and a ResNet1D network, which preserves high-frequency details of facial movements. During implementation, our backbone synthesizes one second of facial animations with 30 fps in only 0.007 seconds.
- 是指HuBERT 和 ResNet1D加一起的时间吗?
- 这个是在什么硬件上的速度啊?
谢谢!