accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

[Feature]: CPU+GPU using accelerate?

Open ArtaFakhari opened this issue 2 years ago • 8 comments

Hey Everyone!

Could accelerate make it possible to distribute the processes between CPU + GPU and RAM + VRAM?

As I'm not familiar with it and only tested accelerate config command, it might be a feature request

ArtaFakhari avatar May 12 '23 08:05 ArtaFakhari

Do you mean for training and using some of the model on CPU and some of the model on GPU? Or can you describe a bit more with the ideal workflow you're imagining?

muellerzr avatar May 12 '23 08:05 muellerzr

Hi @muellerzr My rig has RTX 2070 GPU with 8GB VRAM + AMD Ryzen 3900X CPU + 64GB RAM but actually only one half of the system is being used while using Stable Diffusion. The only thing is in my mind is using CPU + System RAM & GPU + VRAM simultaneously. Thought that accelerate could distribute processes between these two different processing units to train larger models (even with slower performance) or for example create larger images in Stable Diffusion and also getting rid of CUDA out of memory error.

I hope I'm not wrong!

ArtaFakhari avatar May 12 '23 10:05 ArtaFakhari

This is something we're mildly looking into (the ability to train using the same methodology as big model inference), if this is accurate to what you are thinking.

muellerzr avatar May 16 '23 12:05 muellerzr

This is something we're mildly looking into (the ability to train using the same methodology as big model inference), if this is accurate to what you are thinking.

Yes exactly! It would be great to harness the power of both the GPU and CPU simultaneously for parallel processing. This would not only accelerate our training process but also allow us to process data with greater efficiency using our trained models. It's certainly worth exploring for future improvements.

ArtaFakhari avatar May 20 '23 21:05 ArtaFakhari

Hey @muellerzr if this has been approved/needs someone to work on, I'm willing to look into it. Would be needing some advice/help though

rishabbala avatar Jul 11 '23 03:07 rishabbala

Hi @rishabbala, if you'd like to contribute or see what you can get to, that'd be great! I'll be looking into this soon-ish but there's other pressing matters I need to do first. Here's a colab notebook with how far I got, basically there's an issue with autograd we need to consider so that the gradients backprop properly and efficiently: https://colab.research.google.com/drive/1s6tq_zcaXBnP3Ldj42CJ0gg4VXTZDfJ7?usp=sharing

muellerzr avatar Jul 11 '23 17:07 muellerzr

Hi @muellerzr, I went over your notebook and got the overall idea of what we are trying to do. If I understand correctly, we want to move the weights and intermediate tensors to CPU after their forward call, and move them back to GPU before we perform backward. Is this correct? Can you let me know what the current limitation or issue is with what you've implemented and how I can proceed? Also, I was wondering if using a flattened view of tensors when moving to CPU would be better, as then the number of reference calls to access the tensor would be lower.

rishabbala avatar Jul 14 '23 19:07 rishabbala

Any updates on this?

RandomGamingDev avatar Jul 23 '24 20:07 RandomGamingDev