pytorch_optimizer icon indicating copy to clipboard operation
pytorch_optimizer copied to clipboard

optimizer & lr scheduler & loss function collections in PyTorch

Results 16 pytorch_optimizer issues
Sort by recently updated
recently updated
newest added

Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this? RuntimeError: [-] Sharpness Aware Minimization (SAM) requires...

question

Hi I just discovered your repo and I would like to try it to fine-tune my ParlAI blenderbot2 (see https://github.com/facebookresearch/ParlAI) model. However, I am running the model in FP16 precision...

feature request

## Paper or Code REX LR scheduler From https://arxiv.org/abs/2107.04197 Implementation is based on https://github.com/Nerogar/OneTrainer/blob/2c6f34ea0838e5a86774a1cf75093d7e97c70f03/modules/util/lr_scheduler_util.py#L66

feature request

## SAM as an Optimal Relaxation of Bayes > Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can drastically improve generalization, but their underlying mechanisms are not yet fully understood....

feature request

In `pytorch-optimizer v3`, `loss function` will be added. So, finally, the optimizer & lr scheduler & loss function are all in one package. ## Feature - [x] support at least...

feature

#params = 151111638 #non emb params = 41066400 | epoch 1 step 50 | 50 batches | lr 0.06 | ms/batch 1378.43 | loss 7.85 | ppl 2570.784 | epoch...

bug

## Paper and Code Paper: [Memory Efficient Optimizers with 4-bit States](https://arxiv.org/abs/2309.01507) Code : https://github.com/thu-ml/low-bit-optimizers/blob/main/lpmm/optim/optimizer.py

feature request

I just swap out Nero optimizer in my Lightning AI loop and gave the new Shampoo a try. There is something going on with it, as this card is typically...

performance

https://arxiv.org/abs/2211.09760 > While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach...

feature request