Hello, I came across your project and I'm interested to know if it has the capability to run in real-time and drive images to speak?
Hello, I came across your project and I'm interested to know if it has the capability to run in real-time and drive images to speak?
Thanks for interest. Currently XTalker is able to drive image to speak with about 10x speed compared to SadTalker. I have not integrated it into any real-time streaming system. However, I've roughly make some tests and it shows that it can produce 20 seconds video in about 30 seconds, even faster. Basically as I explained in README, this is based on the low precision inference by IPEX together with the parallel inference implementation by me. With further simplification or some tradeoffs on your own application, it will obtain real-time. Also, the results may vary between hardwares. I am not an expert in ffmpeg for pushing real-time videos, and I will look into that. I think you can firstly try to test it in your own system.