Jeff Rasley
Jeff Rasley
Stale, feel free to re-open if this is still required.
Pretty sure this is not needed anymore, the code around this spot has changed significantly since then. @tjruwase do you know more here?
@hyhuang00 please re-open if you are still having an issue.
This ended up being added in a later PR
Hi @siddharth9820, this is interesting. Can you double check the hostfiles with the slurm ids are being created and accessible at the path you are launching from? https://github.com/microsoft/DeepSpeed/blob/bcc617a0009dd27b4e144de59979bd7770eaf57c/deepspeed/launcher/runner.py#L201 This is...
Zero offload isn’t supported yet, we’re actively working on this though. Will update when it’s released :) Feel free to also follow our Twitter account for updates like this: https://twitter.com/msftdeepspeed
Just to extend Mike's response above a bit. @wangshuo6699 it appears your `nvcc --version` is 10.x and the torch you are trying to use was compiled with 11.x. This will...
Thanks for trying out MII, i see this model https://huggingface.co/yentinglin/Taiwan-LLM-13B-v2.0-chat/tree/main only has checkpoints in `safetensors`. We currently don't support loading safetensors but plan to add this soon as it shouldn't...
@FrankLeeeee and @ver217 I just finished debugging an issue on the DeepSpeed side that seems very related to this issue. I can confirm that `op_builder` is being installed in site-packages....