Julia Turc comments

Results 11 comments of


                                            Julia Turc

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

Thanks @BenjaminBossan for looking into this. > Let's first focus on the GPU usage: What looks strange to me is that in your first graph, it continually increases, but then...

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

Here's a minimal script that reproduces the issue. Note that I'm setting `num_inference_steps=1` for the sake of speed, so the absolute numbers will not be comparable to the ones above....

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

Hi there, just wanted to check in and see if there are any new developments

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

Sorry, I didn't understand your plot above. Is it suggesting that the memory consumption is actually constant around 700? And what is the unit of the Y axis / memory...

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

I managed to trace where the memory leak is coming from. It's the `pipe.load_lora_weights` method. As a reminder, the high-level algorithm here is: ``` for n in num_loras: load lora...

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

Here's a full repro in a Colab notebook (just added some print statements to the code snippet above): https://colab.research.google.com/drive/1u-DTQFZHGSiR-287CS3ELUYOC6jfhMqU?usp=sharing You can see that, on each iteration, *two* LoRAs are loaded...

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

Awesome, thanks a lot for looking into this.

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

Thanks so much for the fix and sorry for the delayed response. I will try it out in the next day or two.

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

Thanks again @BenjaminBossan for the PR! I've rerun the notebook above with a higher number of MAX_LORAS. The good news is that, indeed, I'm not seeing 2 LoRAs being loaded...

GPU memory degradation despite offloading to CPU (on 50+ LoRAs for Stable Diffusion)

With this fix, I'm seeing that memory isn't linearly going up anymore. There's a spike on each call, and overall memory consumption is uniform across calls. That's great! As mentioned...