composer icon indicating copy to clipboard operation
composer copied to clipboard

Log multiple losses

Open Landanjs opened this issue 2 years ago • 4 comments

Resolves CO-842. Adds logic to log multiple losses. Also, adds loss dictionary to DeepLabv3+ as an example. A couple of scenarios:

1. Scalar tensor loss

This will be logged as loss/train/total. Bert example:

2. Tuple of tensor losses

Individual losses will be logged as 'loss/train/loss{i}' where i is the index of the individual loss. There will also be 'loss/train/total' which is the sum of the individual losses.

3. Dictionary of losses without total key

Individual losses will be logged as 'loss/train/{loss_name}'. There will also be 'loss/train/total' which is the sum of the individual losses.

4. Dictionary of losses with total key

Individual losses will be logged as 'loss/train/{loss_name}'. Loss given at 'loss/train/total' will be used as the loss for the backpropagation, individual losses are not summed when 'total' is present.

Questions

  • Assumes returned loss is a torch.Tensor, is this fine?
  • Should these different scenarios be documented somewhere?

Landanjs avatar Aug 06 '22 01:08 Landanjs

Ah, I think I skipped over closures... Need some time

Edit: seems like it is working, but still debating with Abhi if it is okay to return a dict from the closure

Landanjs avatar Aug 08 '22 21:08 Landanjs

Is there any way we can get rid of the trailing / when there is only 1 loss value, like loss/train instead of loss/train/ ? I'm just wondering if there could be some parsing issues later on.

[EDIT] or wait maybe it has to be that way for WandB to put plots together in the same place? Hmmm I wonder if we can add a default like loss/train/total that always gets logged... would match the notation a bit better too.

Also, I wonder if we can drop the _loss suffix for things like loss/train/dice rather than loss/train/dice_loss, as it seems a bit repetitive.

abhi-mosaic avatar Aug 08 '22 21:08 abhi-mosaic

Thanks for the suggestion Abhi, good points! Yeah, that would be cleaner, let me try to set it up 🙂

Hmmm that might actually make the closure stuff easier as well!

Landanjs avatar Aug 08 '22 22:08 Landanjs

I don't understand pyright same :')

mvpatel2000 avatar Aug 16 '22 19:08 mvpatel2000

I added some convergence tests for tuple losses and dict losses with 'total'. TBH I do not know how to make logging, but I'm running DeepLabv3 and ResNet experiments (wandb here group by group). The results appear to be align with dev.

Landanjs avatar Aug 22 '22 18:08 Landanjs

I mentioned this to Evan, but just for the record, it looks like i need someone from @mosaicml/composer-team-eng to review as well

Landanjs avatar Aug 23 '22 00:08 Landanjs