KAKSIS

Results 2 comments of KAKSIS

I encountered the same issue during the 14B-MoE training. It appears that versions of DeepSpeed 0.16.0 and above require more GPU memory.

I encountered a similar issue while training a 72B model on an 8x H100 (80G) setup. I’m using the Hugging Face online DPO trainer scripts from [this link](https://huggingface.co/docs/trl/main/en/online_dpo_trainer). To reduce...