openchat icon indicating copy to clipboard operation
openchat copied to clipboard

Detailed Training setting

Open xiuzbl opened this issue 1 year ago • 0 comments

Hi, may you provide the detailed hyper-paramters when you training llama-13b? For example, how many and what kind of GPUs you use, what are the gradient accumulation steps and batch size per GPU? Moreover, when I directly use your deepspeed config setting to deepspeed-initialize a llama-7b on an 80G A100, the server reports CUDA OOM error.

Looking forward to your reply.

Thank you so much!

xiuzbl avatar Jul 11 '23 11:07 xiuzbl