Naveenraj Kamalakannan

Results 7 comments of Naveenraj Kamalakannan

@LucasWilkinson made changes to tests/v1/attention/test_mla_backends.py and it passes now.

@LucasWilkinson yes that's correct - I don't have the hardware for this. I can probably run the quantized version of this R1. maybe `unsloth/DeepSeek-R1-GGUF`?

@xiao10ma @ToluClassics I think for scenarios where different ranks would be used for saving and resuming, [Universal Checkpointing](https://www.deepspeed.ai/tutorials/universal-checkpointing/) would be the way to go. Did you get a chance to...

Hi, After applying Zero Stage 3, you have to get the all the sharded parameters back to analyze the weights. When you apply Zero Stage 3 and use `model.state_dict()`, you're...