av_hubert icon indicating copy to clipboard operation
av_hubert copied to clipboard

pretrain log issue

Open li563042811 opened this issue 2 years ago • 5 comments

Hi, Thank you for you great work first! I found a bug when I pretrained av-hubert using LRS3 data. It's in hubert_criterion.py line 112.

with torch.no_grad(): for i, logp_m in enumerate(logp_m_list): # corr_m, count_m = compute_correct(logp_m) if logp_m.numel() == 0: corr_m, count_m = 0

I think corr_m, count_m = 0 should be corr_m, count_m = 0, 0 or it will come to this error.

Traceback (most recent call last): File "/avhubert/hubert_criterion.py", line 112, in forward corr_m, count_m = 0 TypeError: cannot unpack non-iterable int object

li563042811 avatar Aug 26 '22 02:08 li563042811

Thank you for finding the bug and we will fix this in the code. By the way, which config file did you use in the pretraining? This shouldn't be triggered when masking probability is set to non-zero.

chevalierNoir avatar Aug 26 '22 13:08 chevalierNoir

This is the config file I used in the 1st pretrain iteration. Is this the right file to use in the 1st iteration? https://github.com/facebookresearch/av_hubert/blob/main/avhubert/conf/pretrain/base_lrs3_iter1.yaml

I first used 10% of trainset data to do the cluster. After 5 iterations I found the finetune result is not good as yours. Then I check the log and found that in the 1st iteration I didn't finish 400k steps because it met this error. TypeError: cannot unpack non-iterable int object

Now I fix this bug and use 100% of trainset data to do the cluster and do the 1st iteration again. The loss is about 4.02 at 150k steps which is less than the loss of last exp(10% trainset data cluster) at 300k steps(4.3).

li563042811 avatar Aug 27 '22 04:08 li563042811

Yes. It is the correct config file for 1st iter in LRS3 pretraining.

chevalierNoir avatar Aug 28 '22 03:08 chevalierNoir

Yes. It is the correct config file for 1st iter in LRS3 pretraining.

Hi, thanks for your reply. I have a question about pertaining. After I finish the 1st iteration, should I start the 2nd iteration based on the checkpoint_last.pt of the 1st iteration or start with randomly initialized parameters?

li563042811 avatar Sep 07 '22 11:09 li563042811

Hi,

The 2nd iteration is started with random initialization.

chevalierNoir avatar Sep 07 '22 16:09 chevalierNoir