NanoCode012 comments

Results 342 comments of


                                            NanoCode012

FSDP Torchao 8bit optimizer cause save checkpoint error only at the end

Hey @Nero10578 , just checking back, did you give full state dict a try? Did you manage to solve this issue?

chunked cross entropy loss

Rebased. Added license, arg to docs, and use TORCH_COMPILE_BACKEND for backend if available

Ray bf16 availability: the check does not happen in the gpu worker, so it always says bf16 is not available

Thanks for report, I think I recall seeing this before. Are you interested in making a PR to address this?

Preprocess --debug does not show newline \n token if previous string is ">" but shows if I add any other letter in the role fastchat

Hey, the former sounds like a weird bug. Regarding your double EOS issue, it happens when axolotl would check the last token for the EOS and place it if not...

Qwen 3 conversation with train on inputs false

Sorry I missed this issue. I got it confused with another. Could I have an example sample of your dataset to repro it? Feel free to replace with dummy data....

Qwen 3 conversation with train on inputs false

Thanks, I can repro the issue with a regular multi-turn dataset (more than 1 assistant). The issue seems similar but slightly different to gpt-oss's.

Qwen 3 conversation with train on inputs false

I think I get what you kind of mean but I'm not sure whether we want to go into that much detail for thinking only as we usually leave it...

Model is not getting saved after fine-tuning with weights and biases config: wandb_log_model

Hey, thanks for report. > When setting this config as checkpoint, the model gets saved at each checkpoint but once the training ends it throws error: Does it upload existing...

Model is not getting saved after fine-tuning with weights and biases config: wandb_log_model

That's weird. I wonder if this makes more sense to be reported upstream in transformers regarding wandb integration? All we do is pass wandb configs along.

Model is not getting saved after fine-tuning with weights and biases config: wandb_log_model

Hey @HeenaRajan , just checking back in. Did you raise an issue about this upstream?