ALBEF
ALBEF copied to clipboard
RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)
Still facing the same issue when setting batch_size to 1 or 2. However, batch_size=4 is too big for my gpu memory. How could I fix this issue? Thanks.
Have the same issue when I run 'Pretrain.py'.
Hi, you can try to add a small positive number to the weights as done here: https://github.com/salesforce/ALBEF/blob/fb384204472feab2a85bd4f5790d7889c31672c9/models/model_retrieval.py#L120
Batch_size=1 will not work because there needs to be at least 1 negative sample.
Hi, you can try to add a small positive number to the weights as done here:
https://github.com/salesforce/ALBEF/blob/fb384204472feab2a85bd4f5790d7889c31672c9/models/model_retrieval.py#L120
Batch_size=1 will not work because there needs to be at least 1 negative sample.
Hi LiJunnan1922, Thanks for answering. I have tried that method, however it is not worked for me. I have tried adding 1e-4 and 1e-8 but still getting same error. I reduce the image size to escape from OOM.
Hi, you can try to add a small positive number to the weights as done here:
https://github.com/salesforce/ALBEF/blob/fb384204472feab2a85bd4f5790d7889c31672c9/models/model_retrieval.py#L120
Batch_size=1 will not work because there needs to be at least 1 negative sample.
Hi, LiJunnan1992, It works for me when I set the 'batchsize' to 2. I set 'batchsize' to 1 at first, because of fear of OOM. Thanks for your reply ; )
Hi, you can try to add a small positive number to the weights as done here:
https://github.com/salesforce/ALBEF/blob/fb384204472feab2a85bd4f5790d7889c31672c9/models/model_retrieval.py#L120
Batch_size=1 will not work because there needs to be at least 1 negative sample.
Hi, I'm facing the same issue with model_pretrain.py with batch size 512 with 8 gpus. I added a small epsilon 1e-4, the possibility of error reduced but still could happen. I'm wondering why the error can happen because I think softmax() can make sure the sum is 1, right ?