composer
composer copied to clipboard
Log multiple losses
Resolves CO-842. Adds logic to log multiple losses. Also, adds loss dictionary to DeepLabv3+ as an example. A couple of scenarios:
1. Scalar tensor loss
This will be logged as loss/train/total
. Bert example:
2. Tuple of tensor losses
Individual losses will be logged as 'loss/train/loss{i}'
where i is the index of the individual loss. There will also be 'loss/train/total'
which is the sum of the individual losses.
3. Dictionary of losses without total key
Individual losses will be logged as 'loss/train/{loss_name}'
. There will also be 'loss/train/total'
which is the sum of the individual losses.
4. Dictionary of losses with total key
Individual losses will be logged as 'loss/train/{loss_name}'
. Loss given at 'loss/train/total'
will be used as the loss for the backpropagation, individual losses are not summed when 'total'
is present.
Questions
- Assumes returned loss is a
torch.Tensor
, is this fine? - Should these different scenarios be documented somewhere?
Ah, I think I skipped over closures... Need some time
Edit: seems like it is working, but still debating with Abhi if it is okay to return a dict from the closure
Is there any way we can get rid of the trailing /
when there is only 1 loss value, like loss/train
instead of loss/train/
? I'm just wondering if there could be some parsing issues later on.
[EDIT] or wait maybe it has to be that way for WandB to put plots together in the same place? Hmmm
I wonder if we can add a default like loss/train/total
that always gets logged... would match the notation a bit better too.
Also, I wonder if we can drop the _loss
suffix for things like loss/train/dice
rather than loss/train/dice_loss
, as it seems a bit repetitive.
Thanks for the suggestion Abhi, good points! Yeah, that would be cleaner, let me try to set it up 🙂
Hmmm that might actually make the closure stuff easier as well!
I don't understand pyright
same :')
I added some convergence tests for tuple losses and dict losses with 'total'
. TBH I do not know how to make logging, but I'm running DeepLabv3 and ResNet experiments (wandb here group by group). The results appear to be align with dev.
I mentioned this to Evan, but just for the record, it looks like i need someone from @mosaicml/composer-team-eng to review as well