text-generation-webui Add multi-GPU support to train

Hello

I'm using 4 GPUs, but it's estimated that I'm using only 1 GPU during learning.

If possible, I would appreciate it if you could add a feature that allows me to use multi-GPU.

Apr 12 '23 21:04 stuxnet147

This would be a killer feature... I agree

Apr 13 '23 02:04 practical-dreamer

I suggest using the training script in https://github.com/tloen/alpaca-lora directly. Multigpu requires torchrun, which is a mutiprocess structure too hard to manage in a webui. You should use a script instead.

Apr 13 '23 04:04 sgsdxzy

I've been intending to figure out getting this working in the webui, but, uh, the limitation is I don't have a multi-GPU setup to test with currently.

Apr 14 '23 07:04 mcmonkey4eva

I suggest using the training script in https://github.com/tloen/alpaca-lora directly. Multigpu requires torchrun, which is a mutiprocess structure too hard to manage in a webui. You should use a script instead.

Couldn’t we just make the webui manage a torchrun / deep speed process? Or hell just make the webUI launch the script…

May 03 '23 21:05 practical-dreamer

I suggest using the training script in https://github.com/tloen/alpaca-lora directly. Multigpu requires torchrun, which is a mutiprocess structure too hard to manage in a webui. You should use a script instead.

Just wanted to quickly update this... me and some other fine colleagues have managed to get distributed parallel working by using accelerate to launch both the tloen alpaca trainer and axolotl. Huge increase in performance and temps observed by working both GPUs at once. In one instance we observed an ETA decrease from 110 Hours to 40 Hours for 2048 context llama-7b LoRA finetune split across 2 3090's with NVLINK.

It should be noted that this requires significantly more VRAM as the micro_batch_size is loaded onto each device instead of being split... And also while we tried to keep the hyperparameters consistent in our comparison it is possible we missed something... Finally this uplift in performance is likely multi-gpu exclusive as accelerate allows us to distribute the train...

Regardless I just wanted to post my observations... feel free to view the runs on wandb

Ooba's TextGen Trainer (Non-distributed parallel) https://wandb.ai/vicunlocked/VicUnlocked-7b/runs/lv8xluf7?workspace=user-practicaldreamer

Axolotl launched through accelerate (Distributed parallel) https://wandb.ai/vicunlocked/VicUnlocked-7b/runs/cms3bb81?workspace=user-practicaldreamer

May 16 '23 10:05 practical-dreamer

super agree

Jun 02 '23 08:06 choigawoon

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sep 03 '23 23:09 github-actions[bot]

Has this been pushed to the repo yet? I would like to use multiple GPUs to train.

Sep 25 '23 04:09 norton-chris

text-generation-webui text-generation-webui copied to clipboard

Add multi-GPU support to train

text-generation-webui
text-generation-webui copied to clipboard