LIA icon indicating copy to clipboard operation
LIA copied to clipboard

Training vox from scratch issues

Open boredaoao opened this issue 2 years ago • 14 comments

Everything seems fine except the eyes movement. It seems my model doesn't correctly capture eyes movement from driving videos. It's now 300k steps, should I wait for more steps? Are there any more parameter setting tricks?

boredaoao avatar Oct 24 '22 05:10 boredaoao

Hey, I m about to start training LIA on AVSpeech dataset, and was trying to understand whether they used VoxCeleb1 or VoxCeleb2.(Can't seem to find this info nither in the paper nor in code or in preprocessing code). What dataset do you use? Could it be possible that they used VoxCeleb2 for training and you are using VoxCeleb1(x5 less data)?

leg0m4n avatar Nov 24 '22 20:11 leg0m4n

Hello, sorry to bother you. When I trained the vox model from scratch, the vgg loss was always high at the beginning, and my learning rate was 0.002. Is this normal? image

Wangman1 avatar Oct 08 '23 11:10 Wangman1

@Wangman1 I have trained on HTDF and the tendency is similar to you: vgg loss is high (~ 50) and g loss (~7) is larger than d loss(~0.0x). Although I think this phenomenon is unusual, the result seems correct.

liutaocode avatar Oct 20 '23 06:10 liutaocode

@Wangman1 This is my training loss. I am fixing on it...

image

liutaocode avatar Oct 20 '23 06:10 liutaocode

@Wangman1 This is my training loss. I am fixing on it...

image

请问您用您自己训练好的模型测试,能 drive 成功吗

Wangman1 avatar Oct 20 '23 06:10 Wangman1

@Wangman1 可以的。这也是我觉得奇怪的地方,loss 看着不太对,但是可以驱动成功。

liutaocode avatar Oct 20 '23 08:10 liutaocode

@Wangman1 可以的。这也是我觉得奇怪的地方,loss 看着不太对,但是可以驱动成功。

我最后也训练到和你差不多的 loss,但驱动不成功,我用了很少一部分 vox 数据来训练,不知道是不是数据的问题,你用的 htdf 数据吗 我用的这个训练命令:python train.py --dataset vox --exp_path vox_exp --exp_name 20231020 --batch_size 56 --lr 0.002,不知道有没有问题

Wangman1 avatar Oct 20 '23 08:10 Wangman1

@Wangman1 可以的。这也是我觉得奇怪的地方,loss 看着不太对,但是可以驱动成功。

而且这个训练的很慢,大概一个礼拜训练了 10w iter,你训练的速度如何

Wangman1 avatar Oct 20 '23 09:10 Wangman1

@Wangman1 我是先载入作者 release 的 vox 模型,然后再去 hdtf 训练的,作者应该用的是 vox1 数据集,这个数据集大概有 400 多万张图片,你用的是多少?如果太少的话确实有可能训练不成功。训练是 8 卡 A10, 10w 次迭代大概用了 1 天时间,我这里是机械硬盘,GPU 利用率还没有打满,速度上还有提升空间。

liutaocode avatar Oct 20 '23 09:10 liutaocode

@Wangman1 我是先载入作者 release 的 vox 模型,然后再去 hdtf 训练的,作者应该用的是 vox1 数据集,这个数据集大概有 400 多万张图片,你用的是多少?如果太少的话确实有可能训练不成功。训练是 8 卡 A10, 10w 次迭代大概用了 1 天时间,我这里是机械硬盘,GPU 利用率还没有打满,速度上还有提升空间。

好嘞,我只用了大概 2w,这个数据集下载很慢,当时手上只有很少的数据,我是尝试从头训练的,目前还没有成功,等我多下点数据再试试

Wangman1 avatar Oct 20 '23 09:10 Wangman1

@Wangman1 Thanks for trying our code! Since you fine-tuned the pre-trained vox-based model, it is normal that vgg loss will not decrease too much. The original vox-based model has been well trained on faceial data.

wyhsirius avatar Oct 20 '23 10:10 wyhsirius

@Wangman1 This is the loss trained on vox1 from scratch by myself , maybe it will help. image

liutaocode avatar Oct 25 '23 13:10 liutaocode

@wyhsirius Hello, I would like to inquire about the final L1 loss in your training regimen. I have trained on Vox1 with your training parameters and L1 loss achieved 0.08 but it seems challenging to reduce it further.

liutaocode avatar Oct 30 '23 01:10 liutaocode

@Wangman1 This is the loss trained on vox1 from scratch by myself , maybe it will help. image

👍🏻👍🏻👍🏻 感谢您~

Wangman1 avatar Oct 30 '23 05:10 Wangman1