composer icon indicating copy to clipboard operation
composer copied to clipboard

Focal Loss, Taylor Cross Entropy Loss, SnapMix, Adaptive Gradient Clipping

Open alexriedel1 opened this issue 2 years ago • 2 comments

New Method

Focal Loss

Taylor CE

SnapMix

AGC

Motivation

All of the above mentioned methods either improve the final models accuracy or reduce the training time for achieving the final accuracy. I don't how which of those you already planned to implement but these are the ones I use frequently for training models, that are not part of your library.

[Optional] Implementation

I just stumbled across this repo and I'm not 100% familiar with your implementation process. If it's for loss functions just to add the respective function in this place https://github.com/mosaicml/composer/blob/dev/composer/loss/loss.py and for algorithms in this place https://github.com/mosaicml/composer/tree/dev/composer/algorithms I might just create PRs for the desired methods?

alexriedel1 avatar Mar 29 '22 08:03 alexriedel1

Thanks @alexriedel1 , for suggesting these. Yes, for the loss functions its just adding them to loss.py. We can take care of algorithm-izing them a bit later.

For SnapMix, see https://github.com/mosaicml/composer/blob/dev/composer/algorithms/cutout/cutout.py for an example implementation of a data augmentation technique and pattern there, should be doable.

For AGC, we have implemented basic gradient clipping here (https://github.com/mosaicml/composer/blob/dev/composer/trainer/trainer.py#L1215), but could be enhanced with the AGC version!

hanlint avatar Mar 29 '22 17:03 hanlint

@alexriedel1 , just an update -- we implemented adaptive gradient clipping in #924, give it a try/review and let us know if it helps!

hanlint avatar May 05 '22 14:05 hanlint

Closing. Tracking elsewhere as low pri. We're open to community suggestions!

mvpatel2000 avatar Jun 22 '23 21:06 mvpatel2000