openchat
openchat copied to clipboard
Detailed Training setting
Hi, may you provide the detailed hyper-paramters when you training llama-13b? For example, how many and what kind of GPUs you use, what are the gradient accumulation steps and batch size per GPU? Moreover, when I directly use your deepspeed config setting to deepspeed-initialize a llama-7b on an 80G A100, the server reports CUDA OOM error.
Looking forward to your reply.
Thank you so much!