Naveenraj Kamalakannan
Naveenraj Kamalakannan
Hi @unavailableun Can you share any code to repro?
@LucasWilkinson made changes to tests/v1/attention/test_mla_backends.py and it passes now.
@LucasWilkinson yes that's correct - I don't have the hardware for this. I can probably run the quantized version of this R1. maybe `unsloth/DeepSeek-R1-GGUF`?
Thanks @MatthewBonanni
@xiao10ma @ToluClassics I think for scenarios where different ranks would be used for saving and resuming, [Universal Checkpointing](https://www.deepspeed.ai/tutorials/universal-checkpointing/) would be the way to go. Did you get a chance to...
@XiDianZuoYun I'd like to work on this issue. Can you provide a repro script?
Hi, After applying Zero Stage 3, you have to get the all the sharded parameters back to analyze the weights. When you apply Zero Stage 3 and use `model.state_dict()`, you're...