openchat Detailed Training setting

Detailed Training setting

Open xiuzbl opened this issue 1 year ago • 0 comments

Hi, may you provide the detailed hyper-paramters when you training llama-13b? For example, how many and what kind of GPUs you use, what are the gradient accumulation steps and batch size per GPU? Moreover, when I directly use your deepspeed config setting to deepspeed-initialize a llama-7b on an 80G A100, the server reports CUDA OOM error.

Looking forward to your reply.

Thank you so much!

Jul 11 '23 11:07 xiuzbl

openchat openchat copied to clipboard

Detailed Training setting

openchat
openchat copied to clipboard