NanoCode012 comments

Results 342 comments of


                                            NanoCode012

GRPO training calling DPO dataset processing logic

> `vllm serve ` works flawlessly Same with `CUDA_VISIBLE_DEVICES=7` prepended?

GRPO training calling DPO dataset processing logic

If vllm-serve works, can you just leave that up and run the axolotl train command

Wrong epoch when turning on `context_parallel_size `

Discord thread for reference: https://discord.com/channels/1104757954588196865/1426831119340273787/1427939353446977607

Using deepspeed and activation_offloading together result in wrong parameter key in saved weight

Thanks for the report. We're aware of this. We're thinking of having a post-training script that rewrites the keys as a workaround at the moment.

Using deepspeed and activation_offloading together result in wrong parameter key in saved weight

@zinccat , do you have a working script for the above that you can share? I didn't want to duplicate the effort if you've done so already.

Using deepspeed and activation_offloading together result in wrong parameter key in saved weight

Yeah, leaving this gist for others https://gist.github.com/NanoCode012/0c971d00a32a7d691bd0c19fc3a6d6e1 @shang-zhu, please give this script a try while we debug the real reason

Using deepspeed and activation_offloading together result in wrong parameter key in saved weight

@NicholasGuerrero , hey, could you provide more trace? The issue above is about being nested under an additional, `_checkpoint_wrapped...` key, which I don't see in yours.

Using deepspeed and activation_offloading together result in wrong parameter key in saved weight

@NicholasGuerrero , thanks for the detailed logs. Are you able to print out the model layers in your checkpoints? Can you see if the keys still contains "_checkpoint_wrapped"? To double...

iDetection v7.7 Release

Hi @glenn-jocher , I tested on my 11pro. Very nice fps (19-25 FPS) and accuracy (90+ on recognizable objects). It does get quite hot in a matter of minutes though...

Mixing the prepared dataset on the fly?

Hey, I'm not sure if this is something we plan to support at the moment. Only `pretraining_datasets: ` support processing on the fly. For regular SFT, we pre-tokenize. There is...