NanoCode012

Results 342 comments of NanoCode012

@zinccat , correct me if I'm wrong but is the shape for the router mixed up? ``` self.weight = nn.Parameter(torch.empty(config.num_experts, config.hidden_size, dtype=torch.bfloat16)) ``` Should it be: ``` self.weight = nn.Parameter(torch.empty(...

Hey, FSDP2 with `cpu_ram_efficient_loading` should work in Axolotl. Could you let me know if you've given it a try?

CI passes and the change is minimal, so nothing major should be affected.

Re: https://github.com/axolotl-ai-cloud/axolotl/issues/2878#issuecomment-3051834944 > Offering help with QA-LoRA adapter merge process! Since PEFT doesn't support adapter merging with quantized models yet, I've implemented a custom solution. Successfully replicated the QA-LoRA paper...

@gapsong > I noticed the qzero values are currently being quantized during the save process. Could you share where this is happening in peft?

Hey, thanks for the Issue. One thing I noticed was that, the `type: chat_template`. In the linked example, we pointed to a new transform https://github.com/axolotl-ai-cloud/grpo_code/blob/148ea79321f34bbed79b3b55f04c0a7de002665d/grpo_code/transforms.py#L34 , which properly loads the...

Which model is this? Does vllm's EngineArgs support that param?

Thanks, can you try set the below to `None` https://github.com/axolotl-ai-cloud/axolotl/blob/7026cd5e9e053d51aa271c1f57f62950bcdc599f/src/axolotl/cli/vllm_serve.py#L65-L67 or alternatively, just delete this line: https://github.com/axolotl-ai-cloud/axolotl/blob/7026cd5e9e053d51aa271c1f57f62950bcdc599f/src/axolotl/cli/vllm_serve.py#L81

I haven't seen that CUDA graph log before. I'll ask the team. In meantime, where are you running this? Runpod? Locally?

Just to verify, are you able to run, `vllm serve ...` to see if it's a vllm issue or axolotl issue?