text-generation-webui
text-generation-webui copied to clipboard
Add multi-GPU support to train
Hello
I'm using 4 GPUs, but it's estimated that I'm using only 1 GPU during learning.
If possible, I would appreciate it if you could add a feature that allows me to use multi-GPU.
This would be a killer feature... I agree
I suggest using the training script in https://github.com/tloen/alpaca-lora directly. Multigpu requires torchrun, which is a mutiprocess structure too hard to manage in a webui. You should use a script instead.
I've been intending to figure out getting this working in the webui, but, uh, the limitation is I don't have a multi-GPU setup to test with currently.
I suggest using the training script in https://github.com/tloen/alpaca-lora directly. Multigpu requires torchrun, which is a mutiprocess structure too hard to manage in a webui. You should use a script instead.
Couldn’t we just make the webui manage a torchrun / deep speed process? Or hell just make the webUI launch the script…
I suggest using the training script in https://github.com/tloen/alpaca-lora directly. Multigpu requires torchrun, which is a mutiprocess structure too hard to manage in a webui. You should use a script instead.
Just wanted to quickly update this... me and some other fine colleagues have managed to get distributed parallel working by using accelerate to launch both the tloen alpaca trainer and axolotl. Huge increase in performance and temps observed by working both GPUs at once. In one instance we observed an ETA decrease from 110 Hours to 40 Hours for 2048 context llama-7b LoRA finetune split across 2 3090's with NVLINK.
It should be noted that this requires significantly more VRAM as the micro_batch_size is loaded onto each device instead of being split... And also while we tried to keep the hyperparameters consistent in our comparison it is possible we missed something... Finally this uplift in performance is likely multi-gpu exclusive as accelerate allows us to distribute the train...
Regardless I just wanted to post my observations... feel free to view the runs on wandb
Ooba's TextGen Trainer (Non-distributed parallel)
https://wandb.ai/vicunlocked/VicUnlocked-7b/runs/lv8xluf7?workspace=user-practicaldreamer
Axolotl launched through accelerate (Distributed parallel)
https://wandb.ai/vicunlocked/VicUnlocked-7b/runs/cms3bb81?workspace=user-practicaldreamer
super agree
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Has this been pushed to the repo yet? I would like to use multiple GPUs to train.