bumblebee
bumblebee copied to clipboard
Got OOM message with GTX3060
I've been trying to Stable Diffusion with GPU.
But it failed and I got the OOM message
Is this error message due to insufficient GPU memory? Is it possible to make it work by adjusting some parameters? Stable Diffusion 1.4 is running on this GPU in the tensorflow environment. It would be nice if it works with bumblebee too.
it's working fine with :host . It's amazing how easy it is to use neural networks with livebooks!!!
OS Ubunt 22.04 on WSL2 GPU GTX3060(12GB) Livebook v0.8.0 Elixir v1.14.2 XLA_TARGET=cuda111 CUDA Version: 11.7
05:32:56.019 [info] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
05:32:56.023 [info] XLA service 0x7fb39437dac0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
05:32:56.023 [info] StreamExecutor device (0): NVIDIA GeForce RTX 3060, Compute Capability 8.6
05:32:56.023 [info] Using BFC allocator.
05:32:56.023 [info] XLA backend allocating 10641368678 bytes on device 0 for BFCAllocator.
05:32:58.662 [info] Start cannot spawn child process: No such file or directory
05:34:00.234 [info] total_region_allocated_bytes_: 10641368576 memory_limit_: 10641368678 available bytes: 102 curr_region_allocation_bytes_: 21282737664
05:34:00.234 [info] Stats:
Limit: 10641368678
InUse: 5530766592
MaxInUse: 7566778624
NumAllocs: 3199
MaxAllocSize: 399769600
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
05:34:00.234 [warn] **********___***********************************************************____________________________
05:34:00.234 [error] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 3546709984 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 3.84GiB
constant allocation: 144B
maybe_live_out allocation: 768.0KiB
preallocated temp allocation: 3.30GiB
preallocated temp fragmentation: 304B (0.00%)
total allocation: 7.15GiB
total fragmentation: 821.0KiB (0.01%)
whole log is oommessage.log
We are likely being more inefficient than TensorFlow somewhere. This might be related: https://github.com/elixir-nx/nx/issues/1003
One thing you can try is mixed precision in all of the models:
policy = Axon.MixedPrecision.create_policy(compute: :f16)
# do this for every model
{:ok, %{model: clip_model} = clip} = Bumblebee.load_model({:hf, repository_id, subdir: "text_encoder"})
clip = %{clip | model: Axon.MixedPrecision.apply_policy(clip, policy)}
Note I haven't tested if this would affect image outputs or not
I tried code like this. This didn't help. I got same OOM message.
policy = Axon.MixedPrecision.create_policy(compute: :f16)
{:ok, clip} =
Bumblebee.load_model({:hf, repository_id, subdir: "text_encoder"},
log_params_diff: false
)
clip = %{clip | model: Axon.MixedPrecision.apply_policy(clip.model, policy)}
{:ok, unet} =
Bumblebee.load_model({:hf, repository_id, subdir: "unet"},
params_filename: "diffusion_pytorch_model.bin",
log_params_diff: false
)
unet = %{unet | model: Axon.MixedPrecision.apply_policy(unet.model, policy)}
{:ok, vae} =
Bumblebee.load_model({:hf, repository_id, subdir: "vae"},
architecture: :decoder,
params_filename: "diffusion_pytorch_model.bin",
log_params_diff: false
)
vae = %{vae | model: Axon.MixedPrecision.apply_policy(vae.model, policy)}
{:ok, safety_checker} =
Bumblebee.load_model({:hf, repository_id, subdir: "safety_checker"},
log_params_diff: false
)
safety_checker = %{safety_checker | model: Axon.MixedPrecision.apply_policy(safety_checker.model, policy)}
I see this as well, which is probably expected in that I have only 6 GB.
I will note that I can run things like InvokeAI and do text2img with only 6 GB (and I believe InvokeAI is using the same type of lowered precision to achieve that).
My specs:
nvidia-smi
Sat Dec 10 00:00:46 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 0% 38C P8 6W / 120W | 15MiB / 6144MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 14189 G ...xorg-server-1.20.14/bin/X 9MiB |
| 0 N/A N/A 14217 G ...hell-43.1/bin/gnome-shell 2MiB |
+-----------------------------------------------------------------------------+
I set the following Policy and confirmed that the image can be generated for the host client(not cuda).
policy =
Axon.MixedPrecision.create_policy(
params: {:f, 16},
compute: {:f, 32},
output: {:f, 16}
)
clip = %{clip | model: Axon.MixedPrecision.apply_policy(clip.model, policy)}
unet = %{unet | model: Axon.MixedPrecision.apply_policy(unet.model, policy)}
vae = %{vae | model: Axon.MixedPrecision.apply_policy(vae.model, policy)}
safety_checker = %{
safety_checker
| model: Axon.MixedPrecision.apply_policy(safety_checker.model, policy)
}
serving =
Bumblebee.Diffusion.StableDiffusion.text_to_image(clip, unet, vae, tokenizer, scheduler,
num_steps: 10,
num_images_per_prompt: 1,
safety_checker: safety_checker,
safety_checker_featurizer: featurizer,
compile: [batch_size: 1, sequence_length: 50],
defn_options: [compiler: EXLA]
)
OOM occurs when running in cuda.
Looking at the Peak buffers included in the OOM message, the Shape is f32. Is there no policy effect, or is it a memory problem unrelated to the policy?
Peak buffers:
Buffer 1:
Size: 1.00GiB
XLA Label: custom-call
Shape: f32[2,8,4096,4096]
==========================
Buffer 2:
Size: 144.75MiB
Entry Parameter Subshape: f32[49408,768]
==========================
Yes, it can also be that there are places where we could improve the model efficiency. There are some PRs in the diffusers repo and some Twitter threads:
- https://mobile.twitter.com/Nouamanetazi/status/1576959648912973826
- https://mobile.twitter.com/pcuenq/status/1590665645233881089
- https://mobile.twitter.com/realDanFu/status/1580641495991754752
- https://github.com/huggingface/diffusers/pull/366
- https://github.com/huggingface/diffusers/pull/532
@seanmor5, do you know what we need to do to generate graphs such as this one? https://github.com/huggingface/diffusers/pull/371
Forwarded here from the above issue. Is there anyway for me to give bumblebee more of my memory? Do I need to simply increase the amount of memory I have?
You have 4GB right? That’s currently not enough for SD.
No the VM I run this on has 8GB and the GPU I have has 6GB.
@krainboltgreene we have some experiments that have brought it down to 5GB for a single image. We will be publishing them in the coming weeks.
That is incredible. I have been wanting to dive much deeper into how bumblebee/nx work because I would love to contribute even more to the various APIs. Excited to see the source and learn more.
Opened #147 with a more principled approach.