Potential Bug in Dropout Logic of forward Method

Open WiserZhou opened this issue 9 months ago • 0 comments

https://github.com/NeuralCarver/Michelangelo/blob/6d83b0bacef92715dd5179d45647ed9a3d39bc95/michelangelo/models/conditional_encoders/encoder_factory.py#L554

In the forward method of the MoECLIPImageEncoder class, I noticed the following code snippet:

if zero_embedding_radio > 0:
    mask = torch.rand((len(image), 1, 1), device=z.device, dtype=z.dtype) >= zero_embedding_radio
    z = z + mask.to(z)

The current implementation uses z = z + mask.to(z) to apply the mask. However, I believe the intention of mask here is to act as a dropout mechanism, randomly dropping elements of z to improve the model's generalization ability (as implied by zero_embedding_radio). Using addition (+) does not achieve this effect. This behavior seems inconsistent with the purpose of dropout, where we expect some elements to be zeroed out rather than incremented.

I think the correct operation should be multiplication instead of addition, such as:

z = z * mask.to(z)

This aligns with the standard dropout technique, where a random subset of parameters is masked to prevent overfitting.

Mar 15 '25 13:03 WiserZhou