NanoCode012
NanoCode012
Hey @Nero10578 , just checking back, did you give full state dict a try? Did you manage to solve this issue?
Rebased. Added license, arg to docs, and use TORCH_COMPILE_BACKEND for backend if available
Thanks for report, I think I recall seeing this before. Are you interested in making a PR to address this?
Hey, the former sounds like a weird bug. Regarding your double EOS issue, it happens when axolotl would check the last token for the EOS and place it if not...
Sorry I missed this issue. I got it confused with another. Could I have an example sample of your dataset to repro it? Feel free to replace with dummy data....
Thanks, I can repro the issue with a regular multi-turn dataset (more than 1 assistant). The issue seems similar but slightly different to gpt-oss's.
I think I get what you kind of mean but I'm not sure whether we want to go into that much detail for thinking only as we usually leave it...
Hey, thanks for report. > When setting this config as checkpoint, the model gets saved at each checkpoint but once the training ends it throws error: Does it upload existing...
That's weird. I wonder if this makes more sense to be reported upstream in transformers regarding wandb integration? All we do is pass wandb configs along.
Hey @HeenaRajan , just checking back in. Did you raise an issue about this upstream?