Paul Richmond comments

Repositories
Issues
Comments

Results 2 comments of


                                            Paul Richmond

GPU Memory Imbalance and OOM Errors During Training

I am also encountering this behaviour whilst trying to fine-tune Llama3-8B using QLoRA. However, in my case I'm not using DeepSpeed (at least there's no `deepspeed_config` parameter in my accelerator...

GPU Memory Imbalance and OOM Errors During Training

Hi @SunMarc, thanks for the quick reply! I'm running my script on an HPC cluster where I only request 2 GPUs from a node comprising of 4 GPUs in total....