RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Thanks for your great work! during the replication process, I encountered a stumbling block while Eval. I encountered this problem, how should I solve it?
55%|█████▌ | 11/20 [4:04:49<3:20:18, 1335.42s/it]
Traceback (most recent call last):
File "GPT_eval_multi.py", line 115, in inf, nan or element < 0
Hi. Thank you for your interest. Are you evaluating on our pretrain model? Does the problem occur every time?
Hi. Thank you for your interest. Are you evaluating on our pretrain model? Does the problem occur every time? thanks for your reply .yes , when run the evaluation command on given pre-trained model after 6 iters, it will occur . i used torch 1.8.1+cu111
I found a potential problem in this line. I already commit the change. Along with the updated pretrain model.
Can you try to
- Download the new pretrain model (Section 2.3 in read me)
bash dataset/prepare/download_model.sh
- Run eval (Eval section in read me)
python GPT_eval_multi.py --exp-name eval_name --resume-pth output/vq/2023-07-19-04-17-17_12_VQVAE_20batchResetNRandom_8192_32/net_last.pth --resume-trans output/t2m/2023-10-10-03-17-01_HML3D_44_crsAtt2lyr_mask0.5-1/net_last.pth --num-local-layer 2
--num-local-layer 2 I use 2 local layers instead of 1 in the previous model
If you still see the problem, please let me know. Thank you.
I found a potential problem in this line. I already commit the change. Along with the updated pretrain model.
Can you try to
- Download the new pretrain model (Section 2.3 in read me)
bash dataset/prepare/download_model.sh
- Run eval (Eval section in read me)
python GPT_eval_multi.py --exp-name eval_name --resume-pth output/vq/2023-07-19-04-17-17_12_VQVAE_20batchResetNRandom_8192_32/net_last.pth --resume-trans output/t2m/2023-10-10-03-17-01_HML3D_44_crsAtt2lyr_mask0.5-1/net_last.pth --num-local-layer 2
--num-local-layer 2I use 2 local layers instead of 1 in the previous modelIf you still see the problem, please let me know. Thank you.
Thank you very much! I have solved this problem, but the evaluation results do not seem to reach the original results. I don't know which part caused the error.
This issue should come from the randomness from this line. I just commented it out in the previous commit. Can you double check that it is updated?
This issue should come from the randomness from this line. I just commented it out in the previous commit. Can you double check that it is updated?
Thank you for your answer. This part of the code has been updated. After I modified it, I got an evaluation result of FID 0.088. If I use (predict length) to get 0.080 data, what parameters or pre-trained models should I replace?
We use pretrain length estimator from Text-to-motion (the model that proposed along with HumanML3D paper) here
This can be plugged into our existing model. I will let you know after adding this code to the repo.