ALBEF icon indicating copy to clipboard operation
ALBEF copied to clipboard

RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)

Open yirutsai opened this issue 2 years ago • 5 comments

Still facing the same issue when setting batch_size to 1 or 2. However, batch_size=4 is too big for my gpu memory. How could I fix this issue? Thanks.

yirutsai avatar Jul 27 '22 17:07 yirutsai

Have the same issue when I run 'Pretrain.py'.

SilentMoebuta avatar Jul 29 '22 03:07 SilentMoebuta

Hi, you can try to add a small positive number to the weights as done here: https://github.com/salesforce/ALBEF/blob/fb384204472feab2a85bd4f5790d7889c31672c9/models/model_retrieval.py#L120

Batch_size=1 will not work because there needs to be at least 1 negative sample.

LiJunnan1992 avatar Aug 01 '22 08:08 LiJunnan1992

Hi, you can try to add a small positive number to the weights as done here:

https://github.com/salesforce/ALBEF/blob/fb384204472feab2a85bd4f5790d7889c31672c9/models/model_retrieval.py#L120

Batch_size=1 will not work because there needs to be at least 1 negative sample.

Hi LiJunnan1922, Thanks for answering. I have tried that method, however it is not worked for me. I have tried adding 1e-4 and 1e-8 but still getting same error. I reduce the image size to escape from OOM.

yirutsai avatar Aug 01 '22 09:08 yirutsai

Hi, you can try to add a small positive number to the weights as done here:

https://github.com/salesforce/ALBEF/blob/fb384204472feab2a85bd4f5790d7889c31672c9/models/model_retrieval.py#L120

Batch_size=1 will not work because there needs to be at least 1 negative sample.

Hi, LiJunnan1992, It works for me when I set the 'batchsize' to 2. I set 'batchsize' to 1 at first, because of fear of OOM. Thanks for your reply ; )

SilentMoebuta avatar Aug 05 '22 03:08 SilentMoebuta

Hi, you can try to add a small positive number to the weights as done here:

https://github.com/salesforce/ALBEF/blob/fb384204472feab2a85bd4f5790d7889c31672c9/models/model_retrieval.py#L120

Batch_size=1 will not work because there needs to be at least 1 negative sample.

Hi, I'm facing the same issue with model_pretrain.py with batch size 512 with 8 gpus. I added a small epsilon 1e-4, the possibility of error reduced but still could happen. I'm wondering why the error can happen because I think softmax() can make sure the sum is 1, right ?

zhihuacc avatar Aug 15 '22 03:08 zhihuacc