WizardLM
WizardLM copied to clipboard
DeepSpeed Configuration JSON
Please what GPUs did you use to train
Can you please share your deepspeed config json
I can't get finetune to work with the command you gave on your readme.md and with the deepspeed config json in llamax.
I tried 8x 4090 and 4x A100 neither worked
I will need to use the exact hardware and exact hyperparameters and exact deepspeed config file you used.
Fixed the problem.
Thanks to Rohan, Caseus, and TheBloke for helping with troubleshooting.
Had to remove the following lines from the deepspeed config
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
now training with my dataset on 8x A6000's will take 12 hours.
addressed in PR https://github.com/nlpxucan/WizardLM/pull/17
I was successful with 8x A6000
I also needed environment variable NCCL_P2P_DISABLE=1
What is the size of the model that you are using for this training code @ehartford ?
This was 5 months ago - that's like 3 years in AI time
Haha that is true. Just curious, so what size of models could you fit on to 8x A6000? Was that a 70B model or a 30B one? Just wanted to get a sense of what is feasible with this repo.
I do see some DeepSpeed configuration files now in the repository such as: https://github.com/search?q=repo%3Anlpxucan%2FWizardLM%20deepspeed&type=code I also saw you had a PR but it closed. Would you say this issue can be closed?