echo.yi

Results 4 comments of echo.yi

Can you shard the model across 8 gpus in a single node? I'm having trouble even using a single node.

@BenjaminBossan I tried with `"meta-llama/Meta-Llama-3.1-8B-Instruct"`, `"meta-llama/Meta-Llama-3.1-70B-Instruct"` and neither worked.

@tjruwase from deepspeed shared [this line](https://github.com/huggingface/transformers/blob/2a5a6ad18aa22e98429bb5ecb880660328030ea0/src/transformers/modeling_utils.py#L3796-L3800), indicating applying both quantization and DS ZeRO3 doesn't work.

@BenjaminBossan When I remove ZeRO3 and use quantization & `device_map="auto"`, partitoning does appear to work.