Fizz~

Results 10 comments of Fizz~

lgtm! for what it's worth, there was some asking when this PR was originally opened about whether schedulefree's optims were effective on transformers; i did some testing a while ago,...

> upstreaming this @ [huggingface/transformers#30079](https://github.com/huggingface/transformers/pull/30079) now that this is merged, is there anything reqed by axolotl to implement now?

Is there a way to add something to it without quantization? All the current ones in there have some random quant attached to them

Looks like it needs transformers>=4.47.0, am I good to bump the version in the PR?

![image](https://github.com/user-attachments/assets/39577683-185a-4d44-9f48-d49b7ea85bb7) Other than a TF mismatch when installing Aphrodite, seems to work fine

... looks like this puppy has some fixing to do, that graph makes zero sense ![image](https://github.com/user-attachments/assets/fbae377a-7356-40a3-83a4-a34ce9ea11ba) ![image](https://github.com/user-attachments/assets/1fd2c9b7-d876-4894-b316-818aa1c9d8cb) either the slight jank i did to get it working on unpinned modern...

Ohhh it reports gradacc steps as individual steps 🤦‍♀️ that makes sense as to why the graph is funky! And thanks for the advice, I was trying out MLM initially...

Any updates on this? It's likely required to get the proper performance out of the Gemma 2 models

![image](https://github.com/user-attachments/assets/d45b38e5-f578-4120-a00c-eeb30d2cd53c) FWIW, an MN lora trained fine for me on 1xGPU but I'm still seeing people occasionally complaining about this being a bug. Possibly a multi-GPU issue? Seems to persist...