NanoCode012
NanoCode012
Sure, thanks for the followup
Thanks for the report. I did not see where in Liger plugin that calls `get_trainer_cls`. In fact, it shouldn't be doing so. Did you modify the plugin code? could you...
Correction, this is likely an issue on our end. Could you please give the linked PR a try? This error sounded familiar as I fixed it once before in another...
Thanks for the PR and figuring it out! I think for a lot of people, they would prefer bf16 master weights to save VRAM. Could an alternative solution be, creating...
Ok, after some internal discussion, I'm good with this PR now. My next thought would whether to convert existing example yamls to use `bfloat16` for backward compatibility?
Still a todo on updating tests and warning about this. I think this warrants some sort of warning cycle before we switch as folks may be running from main. However,...
@winglian , weirdly not getting vram savings as in benchmarks. Current **early** wandb result show that: about 20% faster with same vram usage. However, kernel benchmarking showed it using less...
Updated PR from main and added more validation/docs on attention. It is a bit faster than FA for adapter mode. I added warning that this is not recommended for FFT...
I wonder if this is something that we can extend to, or is it unrelated? @djsaunde In meantime, have you tried any of our optimizations for cross entropy? CCE or...
@winglian , agreed that `embed_tokens` are not the expensive operations. For that linked feature, we can indeed add it, however, I'm not sure what's the most intuitive way for a...