Fizz~
Fizz~
lgtm! for what it's worth, there was some asking when this PR was originally opened about whether schedulefree's optims were effective on transformers; i did some testing a while ago,...
> upstreaming this @ [huggingface/transformers#30079](https://github.com/huggingface/transformers/pull/30079) now that this is merged, is there anything reqed by axolotl to implement now?
Is there a way to add something to it without quantization? All the current ones in there have some random quant attached to them
Looks like it needs transformers>=4.47.0, am I good to bump the version in the PR?
 Other than a TF mismatch when installing Aphrodite, seems to work fine
Working on that now 🫡
... looks like this puppy has some fixing to do, that graph makes zero sense   either the slight jank i did to get it working on unpinned modern...
Ohhh it reports gradacc steps as individual steps 🤦♀️ that makes sense as to why the graph is funky! And thanks for the advice, I was trying out MLM initially...
Any updates on this? It's likely required to get the proper performance out of the Gemma 2 models
 FWIW, an MN lora trained fine for me on 1xGPU but I'm still seeing people occasionally complaining about this being a bug. Possibly a multi-GPU issue? Seems to persist...