NanoCode012
NanoCode012
Thanks for the PR. I've let the tests run first while we take some time to confirm the original issue. Good catch on the num_items_in_batch, it's still haunting us till...
> @NanoCode012 All fixed! Used pop(None), removed the v2 override, and fixed the lint issue. Thanks, letting the test re-run
PR #3141 on hold atm
We currently use a custom implementation for muon https://github.com/axolotl-ai-cloud/axolotl-contribs-mit/blob/main/src/axolotl/contribs/mit/muon.py We're open to a PR for this if you would like to give it a try.
I wonder if this is because you're using `sample_packing_eff_est = 1.0`? I'm not sure reaching 1.0 is possible? If you try use `0.9`, does the calculation and num steps match?
Hey, thanks @aidando73 . I whipped up a PR for this if you're interested in testing it. I did not do any testing on it, so it's very rough especially...
@aidando73 just pushed a fix for this. I'll find some time later to iron bugs out
Hello, thanks for the report. It has been sometime since we worked on causal lm eval, so there may have been conflicts in logging. If you have any bandwidth, would...
Yep, this is something I was just checking. I saw that upstream transformers EP PR was merged https://github.com/huggingface/transformers/pull/39501 . It uses `kernels-community/megablocks` (not sure if same as databrick's one) I...
@zinccat , could you share how you got it working for qwen3 for reference purposes? This is currently a WIP for us.