audio2face_mm2023 icon indicating copy to clipboard operation
audio2face_mm2023 copied to clipboard

文章里写的推理时间0.007s是指HuBERT 和 ResNet1D总的时间吗?

Open zgyh001 opened this issue 1 year ago • 1 comments

文中:Our backbone is built on a pretrained HuBERT model and a ResNet1D network, which preserves high-frequency details of facial movements. During implementation, our backbone synthesizes one second of facial animations with 30 fps in only 0.007 seconds.

  1. 是指HuBERT 和 ResNet1D加一起的时间吗?
  2. 这个是在什么硬件上的速度啊?

谢谢!

zgyh001 avatar Jan 10 '24 07:01 zgyh001