echo.yi comments

Repositories
Issues
Comments

Results 4 comments of


                                            echo.yi

Multi node multi GPU sharding for inference / training Llama 405B

Can you shard the model across 8 gpus in a single node? I'm having trouble even using a single node.

Cannot apply both PEFT QLoRA and DeepSpeed ZeRO3

@BenjaminBossan I tried with `"meta-llama/Meta-Llama-3.1-8B-Instruct"`, `"meta-llama/Meta-Llama-3.1-70B-Instruct"` and neither worked.

Cannot apply both PEFT QLoRA and DeepSpeed ZeRO3

@tjruwase from deepspeed shared [this line](https://github.com/huggingface/transformers/blob/2a5a6ad18aa22e98429bb5ecb880660328030ea0/src/transformers/modeling_utils.py#L3796-L3800), indicating applying both quantization and DS ZeRO3 doesn't work.

Cannot apply both PEFT QLoRA and DeepSpeed ZeRO3

@BenjaminBossan When I remove ZeRO3 and use quantization & `device_map="auto"`, partitoning does appear to work.