av_hubert
av_hubert copied to clipboard
pretrain log issue
Hi, Thank you for you great work first! I found a bug when I pretrained av-hubert using LRS3 data. It's in hubert_criterion.py line 112.
with torch.no_grad(): for i, logp_m in enumerate(logp_m_list): # corr_m, count_m = compute_correct(logp_m) if logp_m.numel() == 0: corr_m, count_m = 0
I think corr_m, count_m = 0 should be corr_m, count_m = 0, 0 or it will come to this error.
Traceback (most recent call last): File "/avhubert/hubert_criterion.py", line 112, in forward corr_m, count_m = 0 TypeError: cannot unpack non-iterable int object
Thank you for finding the bug and we will fix this in the code. By the way, which config file did you use in the pretraining? This shouldn't be triggered when masking probability is set to non-zero.
This is the config file I used in the 1st pretrain iteration. Is this the right file to use in the 1st iteration? https://github.com/facebookresearch/av_hubert/blob/main/avhubert/conf/pretrain/base_lrs3_iter1.yaml
I first used 10% of trainset data to do the cluster. After 5 iterations I found the finetune result is not good as yours. Then I check the log and found that in the 1st iteration I didn't finish 400k steps because it met this error. TypeError: cannot unpack non-iterable int object
Now I fix this bug and use 100% of trainset data to do the cluster and do the 1st iteration again. The loss is about 4.02 at 150k steps which is less than the loss of last exp(10% trainset data cluster) at 300k steps(4.3).
Yes. It is the correct config file for 1st iter in LRS3 pretraining.
Yes. It is the correct config file for 1st iter in LRS3 pretraining.
Hi, thanks for your reply. I have a question about pertaining. After I finish the 1st iteration, should I start the 2nd iteration based on the checkpoint_last.pt of the 1st iteration or start with randomly initialized parameters?
Hi,
The 2nd iteration is started with random initialization.