Kun Cheng
Kun Cheng
Thanks for your attention. We did try that. Detecting keypoints separately for each frame and then cropping them results in _jitter_ in the cropped video. And this cropping strategy is...
What is the output of `torch.cuda.is_available()` in your environment?
You should run `torch.cuda.is_available()` in Python instead of terminal. ``` import torch print(torch.cuda.is_available()) ```
That looks fine. What is the full name of your GPU? What is the gpu utilization when executing step6?
You can try reducing these these two parameters: `--face_det_batch_size` and `--LNet_batch_size`. Or you can run the code in colab: [https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb](https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb)
> I met the same error,how resolve it ? You can try smaller batch size: ``` python3 inference.py \ --face examples/face/1.mp4 \ --audio examples/audio/1.wav \ --outfile results/1_1.mp4 \ --face_det_batch_size 2...
训练所用的数据集为英文,可以泛化到不同语种,但性能有一定程度的下降。 将LNet在合适的大规模中文视频数据集上重新训练或许能提升效果。
关于LNet的训练过程目前可以参考[Wav2Lip](https://github.com/Rudrabha/Wav2Lip#train),我们与其类似采用self-reconstruction的方式在LRS2 dataset上训练。 迁移到不同数据集上训练有一定困难,若是从网络上收集的数据首先需要进行音视频对齐,其次训练lip-sync判别器,最后训练lip-sync network,具体可以参考[这里](https://github.com/Rudrabha/Wav2Lip#training-on-datasets-other-than-lrs2)。
> @kunncheng 看SadTalker的项目说是在VoxCeleb1 数据集上训练的,感觉中文的唇动效果似乎要比video-retalking效果好一些。不知道是否有计划提供一些其他数据集上进行训练的模型。 SadTalker是驱动单张图像,本项目是编辑视频,多帧与单帧任务之间难度不同,这也是DNet所要解决的问题,希望能将多帧驱动简化为单帧,即将口型归一化。 也尝试过在别的数据集上训练,但难以收敛或性能未取得明显提升。因此暂时没有该计划
You can try to use GFPGAN/GPEN again or other super-resolution/face restoration methods to enhance the generated video. It's worth noting that GFPGAN will have a slightly change on identity.