PhotoMaker icon indicating copy to clipboard operation
PhotoMaker copied to clipboard

How app. py performs parallel computing through multiple GPUs

Open Alexkerl opened this issue 1 year ago • 4 comments

I set CUDA_VISIBLE_DEVICES=0,1,2,3 but but it only calculates on single GPU

Alexkerl avatar Feb 02 '24 02:02 Alexkerl

https://huggingface.co/docs/diffusers/training/distributed_inference

Paper99 avatar Feb 02 '24 05:02 Paper99

https://huggingface.co/docs/diffusers/training/distributed_inference

The above tutorial may be helpful for distributed running, but if I want to run this program on a 2080ti of 4 * 12GB, I will still encounter an out of memory issue

Alexkerl avatar Feb 02 '24 06:02 Alexkerl

Try to switch dtype to torch.bfloat16. It seems to work on cpu mode on 2080ti, which leads to lower speed.

Besides, you could refer to the official implementation on reducing memory usage: https://huggingface.co/docs/diffusers/main/en/optimization/memory

Paper99 avatar Feb 02 '24 07:02 Paper99

Using this link as a solution.

Paper99 avatar Feb 02 '24 17:02 Paper99