Naveenraj Kamalakannan comments

Results 7 comments of


                                            Naveenraj Kamalakannan

[BUG] DeepCompile: MemoryProfiling error /pytorch/build/aten/src/ATen/RegisterCUDA.cpp:7280: SymIntArrayRef expected to contain only concrete integers

Hi @unavailableun Can you share any code to repro?

Prefill / Decode Split into Compiled Region

@LucasWilkinson made changes to tests/v1/attention/test_mla_backends.py and it passes now.

Prefill / Decode Split into Compiled Region

@LucasWilkinson yes that's correct - I don't have the hardware for this. I can probably run the quantized version of this R1. maybe `unsloth/DeepSeek-R1-GGUF`?

Prefill / Decode Split into Compiled Region

Thanks @MatthewBonanni

[Question] How to Resume DeepSpeed ZeRO-2 Training with a Different Number of GPUs?

@xiao10ma @ToluClassics I think for scenarios where different ranks would be used for saving and resuming, [Universal Checkpointing](https://www.deepspeed.ai/tutorials/universal-checkpointing/) would be the way to go. Did you get a chance to...

[BUG]ZeRO-2 + CPU Offload + overlap_comm=true, the IPG (Independent Partition Gradient) buckets are never populated.

@XiDianZuoYun I'd like to work on this issue. Can you provide a repro script?

Apply Zero-3 and LoRA appears empty lora weight [0]

Hi, After applying Zero Stage 3, you have to get the all the sharded parameters back to analyze the weights. When you apply Zero Stage 3 and use `model.state_dict()`, you're...