meshgpt-pytorch icon indicating copy to clipboard operation
meshgpt-pytorch copied to clipboard

Commit loss is negative

Open ZekaiGalaxy opened this issue 1 year ago • 8 comments

image

When I trained on several objects with several epochs, the commit loss starts to become negative, and it turns out that the overall loss keeps going down, but neither the recon loss nor the reconstruction result turns better.

I wonder if the commit loss being negative is normal or not, or what it implies

ZekaiGalaxy avatar Dec 28 '23 09:12 ZekaiGalaxy

Try lowering the diversity_gamma from 1.0 to 0.1 - 0.3. The quantizer uses this variable to lower the loss, so it uses 1.0 which allows it to trick itself to be more diverse since it punishes exploration less. As you can see,, your commit loss is near -1 which is due to the code below, I think it's good to have high diversity at start but at the end it might do more harm then good.

Part of the commit loss calculation in LFQ: entropy_aux_loss = per_sample_entropy - self.diversity_gamma * codebook_entropy

autoencoder = MeshAutoencoder(    
    num_discrete_coors = 128 ,
    rlfq_kwargs = {"diversity_gamma": 0.2 }
)

MarcusLoppe avatar Dec 28 '23 15:12 MarcusLoppe

Try lowering the diversity_gamma from 1.0 to 0.1 - 0.3. The quantizer uses this variable to lower the loss, so it uses 1.0 which allows it to trick itself to be more diverse since it punishes exploration less. As you can see,, your commit loss is near -1 which is due to the code below, I think it's good to have high diversity at start but at the end it might do more harm then good.

Part of the commit loss calculation in LFQ: entropy_aux_loss = per_sample_entropy - self.diversity_gamma * codebook_entropy

autoencoder = MeshAutoencoder(    
    num_discrete_coors = 128 ,
    rlfq_kwargs = {"diversity_gamma": 0.2 }
)

Hi @MarcusLoppe

Happy New Year! 🎉🎉🎉

I attempted to train an autoencoder using 20 different chairs as training samples and encountered the same issue where the commit loss was negative.

This is the commit loss curve during my training process.

W B Chart 2023_12_28 23_14_01

I will reduce the diversity_gamma from 1.0 to between 0.1 and 0.3 to see what changes occur in the commit loss.

Best regards, Xueqi Ma

qixuema avatar Dec 28 '23 15:12 qixuema

Try lowering the diversity_gamma from 1.0 to 0.1 - 0.3. The quantizer uses this variable to lower the loss, so it uses 1.0 which allows it to trick itself to be more diverse since it punishes exploration less. As you can see,, your commit loss is near -1 which is due to the code below, I think it's good to have high diversity at start but at the end it might do more harm then good. Part of the commit loss calculation in LFQ: entropy_aux_loss = per_sample_entropy - self.diversity_gamma * codebook_entropy

autoencoder = MeshAutoencoder(    
    num_discrete_coors = 128 ,
    rlfq_kwargs = {"diversity_gamma": 0.2 }
)

Hi @MarcusLoppe

Happy New Year! 🎉🎉🎉

I attempted to train an autoencoder using 20 different chairs as training samples and encountered the same issue where the commit loss was negative.

This is the commit loss curve during my training process.

W B Chart 2023_12_28 23_14_01

I will reduce the diversity_gamma from 1.0 to between 0.1 and 0.3 to see what changes occur in the commit loss.

Best regards, Xueqi Ma

Experiment a little bit since I just discovered this yesterday and haven't tested it out fully :) Diversity is good at the start of the training but changing too much at the end of the training isn't very good.

Please let me know what you find out.

MarcusLoppe avatar Dec 28 '23 15:12 MarcusLoppe

@MarcusLoppe

I tried reducing the diversity_gamma from 1.0 to 0.2 and retrained the data. The current commit loss curve is shown in the following image.

W B Chart 2023_12_29 09_05_36

qixuema avatar Dec 29 '23 01:12 qixuema

Thank you @MarcusLoppe, From my perspective, maybe we can use a 'decaying' gamma, since we want it to explore at the beginning but converge at the end.

I also notice that in @qixuema 's experiment, with gamma = 0.2 commit loss do drop, but there are some extreme commit loss values. Does that mean the model overfits to a certain or several types of shapes, codes, whatever and can't do rare cases well.

ZekaiGalaxy avatar Dec 29 '23 05:12 ZekaiGalaxy

Also @qixuema how's your recon loss going? I find that though my recon loss is going down (~0.32), I still can't reconstruct the train data using autoencoder when trained on multi objects.

ZekaiGalaxy avatar Dec 29 '23 05:12 ZekaiGalaxy

Hi, @ZekaiGalaxy

The following are my recon_loss and total_loss.

W B Chart 2023_12_29 14_51_00

W B Chart 2023_12_29 14_50_48

qixuema avatar Dec 29 '23 07:12 qixuema

@lucidrains

Hi all, this issue resolves itself when training on a large dataset, using a 300x50 augmentations the commit was at 3-14 at start and then settled itself and matched the recon loss at around 0.6

MarcusLoppe avatar Jan 30 '24 19:01 MarcusLoppe