instruct-pix2pix icon indicating copy to clipboard operation
instruct-pix2pix copied to clipboard

CUDA out of memory

Open Burve opened this issue 2 years ago • 5 comments

After installation with some adventures (mentioned in other issues :) ) I got Web UI to run, but not the process. I am getting CUDA out of error message and so far, googling told me about code editing to send data in batches or changing Enviromental variables. I tried to set PYTORCH_CUDA_ALLOC_CONF to max_split_size_mb:128 and max_split_size_mb:512 with no change.

I am on windows with 2080ti

my error when I press "Load Example" button (or try to run direct python command with it). Same Happens with any other image when I load it in, add text prompt and press "Generate" button.

Traceback (most recent call last):
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 833, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\edit_app.py", line 125, in load_example
    return [example_image, example_instruction] + generate(
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\edit_app.py", line 160, in generate
    with torch.no_grad(), autocast("cuda"), model.ema_scope():
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\./stable_diffusion\ldm\models\diffusion\ddpm_edit.py", line 185, in ema_scope
    self.model_ema.store(self.model.parameters())
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\./stable_diffusion\ldm\modules\ema.py", line 62, in store
    self.collected_params = [param.clone() for param in parameters]
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\./stable_diffusion\ldm\modules\ema.py", line 62, in <listcomp>
    self.collected_params = [param.clone() for param in parameters]
RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 11.00 GiB total capacity; 10.04 GiB already allocated; 0 bytes free; 10.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF```

Any recommendations on how to get pass this?

Thanks.

Burve avatar Jan 21 '23 15:01 Burve

Have you try to run this file with cmd as that will help the program to get allocated more space from your pc in other to run smoothly...

Kundanagrawalofficial avatar Jan 21 '23 15:01 Kundanagrawalofficial

I tried python edit_cli.py --input imgs/example.jpg --output imgs/output.jpg --edit "turn him into a cyborg" and python edit_cli.py --steps 100 --resolution 512 --seed 1371 --cfg-text 7.5 --cfg-image 1.2 --input imgs/example.jpg --output imgs/output.jpg --edit "turn him into a cyborg" With exactly the same results.

here is the full log from the last command

E:\Instruct-pix2pix\instruct-pix2pix-main>python edit_cli.py --steps 100 --resolution 512 --seed 1371 --cfg-text 7.5 --cfg-image 1.2 --input imgs/example.jpg --output imgs/output.jpg --edit "turn him into a cyborg"
Loading model from checkpoints/instruct-pix2pix-00-22000.ckpt
Global Step: 22000
C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'logit_scale', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'text_projection.weight', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'visual_projection.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.weight']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│                                                                                                  │
│ E:\Instruct-pix2pix\instruct-pix2pix-main\edit_cli.py:128 in <module>                            │
│                                                                                                  │
│   125                                                                                            │
│   126                                                                                            │
│   127 if __name__ == "__main__":                                                                 │
│ ❱ 128 │   main()                                                                                 │
│   129                                                                                            │
│ E:\Instruct-pix2pix\instruct-pix2pix-main\edit_cli.py:98 in main                                 │
│                                                                                                  │
│    95 │   │   input_image.save(args.output)                                                      │
│    96 │   │   return                                                                             │
│    97 │                                                                                          │
│ ❱  98 │   with torch.no_grad(), autocast("cuda"), model.ema_scope():                             │
│    99 │   │   cond = {}                                                                          │
│   100 │   │   cond["c_crossattn"] = [model.get_learned_conditioning([args.edit])]                │
│   101 │   │   input_image = 2 * torch.tensor(np.array(input_image)).float() / 255 - 1            │
│                                                                                                  │
│ C:\Users\AlexI\AppData\Local\Programs\Python\Python310\lib\contextlib.py:135 in __enter__        │
│                                                                                                  │
│   132 │   │   # they are only needed for recreation, which is not possible anymore               │
│   133 │   │   del self.args, self.kwds, self.func                                                │
│   134 │   │   try:                                                                               │
│ ❱ 135 │   │   │   return next(self.gen)                                                          │
│   136 │   │   except StopIteration:                                                              │
│   137 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   138                                                                                            │
│                                                                                                  │
│ E:\Instruct-pix2pix\instruct-pix2pix-main\./stable_diffusion\ldm\models\diffusion\ddpm_edit.py:1 │
│ 85 in ema_scope                                                                                  │
│                                                                                                  │
│    182 │   @contextmanager                                                                       │
│    183 │   def ema_scope(self, context=None):                                                    │
│    184 │   │   if self.use_ema:                                                                  │
│ ❱  185 │   │   │   self.model_ema.store(self.model.parameters())                                 │
│    186 │   │   │   self.model_ema.copy_to(self.model)                                            │
│    187 │   │   │   if context is not None:                                                       │
│    188 │   │   │   │   print(f"{context}: Switched to EMA weights")                              │
│                                                                                                  │
│ E:\Instruct-pix2pix\instruct-pix2pix-main\./stable_diffusion\ldm\modules\ema.py:62 in store      │
│                                                                                                  │
│   59 │   │     parameters: Iterable of `torch.nn.Parameter`; the parameters to be                │
│   60 │   │   │   temporarily stored.                                                             │
│   61 │   │   """                                                                                 │
│ ❱ 62 │   │   self.collected_params = [param.clone() for param in parameters]                     │
│   63 │                                                                                           │
│   64 │   def restore(self, parameters):                                                          │
│   65 │   │   """                                                                                 │
│                                                                                                  │
│ E:\Instruct-pix2pix\instruct-pix2pix-main\./stable_diffusion\ldm\modules\ema.py:62 in <listcomp> │
│                                                                                                  │
│   59 │   │     parameters: Iterable of `torch.nn.Parameter`; the parameters to be                │
│   60 │   │   │   temporarily stored.                                                             │
│   61 │   │   """                                                                                 │
│ ❱ 62 │   │   self.collected_params = [param.clone() for param in parameters]                     │
│   63 │                                                                                           │
│   64 │   def restore(self, parameters):                                                          │
│   65 │   │   """                                                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 11.00 GiB total capacity; 10.04 GiB already
allocated; 0 bytes free; 10.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try
setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and
PYTORCH_CUDA_ALLOC_CONF

Burve avatar Jan 21 '23 16:01 Burve

Had the same issue, changing things to fp16 appeared to reduce VRAM usage enough: https://github.com/SirBenet/instruct-pix2pix

SirBenet avatar Jan 21 '23 17:01 SirBenet

Same here. Using the google colab version:

OutOfMemoryError: CUDA out of memory. Tried to allocate 7.82 GiB (GPU 0; 39.59 GiB total capacity; 31.21 GiB already allocated; 3.17 GiB free; 34.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Any suggestion?

venturaEffect avatar Jan 21 '23 17:01 venturaEffect

Had the same issue, changing things to fp16 appeared to reduce VRAM usage enough: https://github.com/SirBenet/instruct-pix2pix

This version worked for me fine.

I wonder where the output file is saved after running WebUI, but that is not in this scope :)

Burve avatar Jan 21 '23 17:01 Burve

Good, glad you got it working.

As some more context: for the edit_cli and edit_app applications, you should expect peak VRAM usage to be around 18.5GB in the default configuration. So, if your GPU has less memory, it probably won't work without changing some settings

holynski avatar Jan 21 '23 19:01 holynski

Why was this closed ? I ve been trying the entire day to make it work but with no success no came up to the exact same error

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/estathop/Desktop/instruct-pix2pix-main/edit_cli.py:128 in <module> │ │ │ │ 125 │ │ 126 │ │ 127 if __name__ == "__main__": │ │ ❱ 128 │ main() │ │ 129 │ │ │ │ /home/estathop/Desktop/instruct-pix2pix-main/edit_cli.py:98 in main │ │ │ │ 95 │ │ input_image.save(args.output) │ │ 96 │ │ return │ │ 97 │ │ │ ❱ 98 │ with torch.no_grad(), autocast("cuda"), model.ema_scope(): │ │ 99 │ │ cond = {} │ │ 100 │ │ cond["c_crossattn"] = [model.get_learned_conditioning([args.edit])] │ │ 101 │ │ input_image = 2 * torch.tensor(np.array(input_image)).float() / 255 - 1 │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/contextlib.py:113 in __enter__ │ │ │ │ 110 │ │ # they are only needed for recreation, which is not possible anymore │ │ 111 │ │ del self.args, self.kwds, self.func │ │ 112 │ │ try: │ │ ❱ 113 │ │ │ return next(self.gen) │ │ 114 │ │ except StopIteration: │ │ 115 │ │ │ raise RuntimeError("generator didn't yield") from None │ │ 116 │ │ │ │ /home/estathop/Desktop/instruct-pix2pix-main/./stable_diffusion/ldm/models/diffusion/ddpm_edit.p │ │ y:185 in ema_scope │ │ │ │ 182 │ @contextmanager │ │ 183 │ def ema_scope(self, context=None): │ │ 184 │ │ if self.use_ema: │ │ ❱ 185 │ │ │ self.model_ema.store(self.model.parameters()) │ │ 186 │ │ │ self.model_ema.copy_to(self.model) │ │ 187 │ │ │ if context is not None: │ │ 188 │ │ │ │ print(f"{context}: Switched to EMA weights") │ │ │ │ /home/estathop/Desktop/instruct-pix2pix-main/./stable_diffusion/ldm/modules/ema.py:62 in store │ │ │ │ 59 │ │ parameters: Iterable of torch.nn.Parameter; the parameters to be │ │ 60 │ │ │ temporarily stored. │ │ 61 │ │ """ │ │ ❱ 62 │ │ self.collected_params = [param.clone() for param in parameters] │ │ 63 │ │ │ 64 │ def restore(self, parameters): │ │ 65 │ │ """ │ │ │ │ /home/estathop/Desktop/instruct-pix2pix-main/./stable_diffusion/ldm/modules/ema.py:62 in │ │ <listcomp> │ │ │ │ 59 │ │ parameters: Iterable of torch.nn.Parameter; the parameters to be │ │ 60 │ │ │ temporarily stored. │ │ 61 │ │ """ │ │ ❱ 62 │ │ self.collected_params = [param.clone() for param in parameters] │ │ 63 │ │ │ 64 │ def restore(self, parameters): │ │ 65 │ │ """ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 10.76 GiB total capacity; 9.46 GiB already allocated; 20.00 MiB free; 9.61 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

estathop avatar Jan 23 '23 01:01 estathop

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/estathop/Desktop/instruct-pix2pix-main/edit_cli.py:130 in │ │ │ │ 127 │ │ 128 │ │ 129 if name == "main": │ │ ❱ 130 │ main() │ │ 131 │ │ │ │ /home/estathop/Desktop/instruct-pix2pix-main/edit_cli.py:85 in main │ │ │ │ 82 │ model.eval().cuda() │ │ 83 │ model_wrap = K.external.CompVisDenoiser(model) │ │ 84 │ model_wrap_cfg = CFGDenoiser(model_wrap) │ │ ❱ 85 │ null_token = model.get_learned_conditioning([""]) │ │ 86 │ │ │ 87 │ seed = random.randint(0, 100000) if args.seed is None else args.seed │ │ 88 │ input_image = Image.open(args.input).convert("RGB") │ │ │ │ /home/estathop/Desktop/instruct-pix2pix-main/./stable_diffusion/ldm/models/diffusion/ddpm_edit.p │ │ y:588 in get_learned_conditioning │ │ │ │ 585 │ def get_learned_conditioning(self, c): │ │ 586 │ │ if self.cond_stage_forward is None: │ │ 587 │ │ │ if hasattr(self.cond_stage_model, 'encode') and callable(self.cond_stage_mod │ │ ❱ 588 │ │ │ │ c = self.cond_stage_model.encode(c) │ │ 589 │ │ │ │ if isinstance(c, DiagonalGaussianDistribution): │ │ 590 │ │ │ │ │ c = c.mode() │ │ 591 │ │ │ else: │ │ │ │ /home/estathop/Desktop/instruct-pix2pix-main/./stable_diffusion/ldm/modules/encoders/modules.py: │ │ 162 in encode │ │ │ │ 159 │ │ return z │ │ 160 │ │ │ 161 │ def encode(self, text): │ │ ❱ 162 │ │ return self(text) │ │ 163 │ │ 164 │ │ 165 class FrozenCLIPTextEmbedder(nn.Module): │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/estathop/Desktop/instruct-pix2pix-main/./stable_diffusion/ldm/modules/encoders/modules.py: │ │ 156 in forward │ │ │ │ 153 │ │ batch_encoding = self.tokenizer(text, truncation=True, max_length=self.max_lengt │ │ 154 │ │ │ │ │ │ │ │ │ │ return_overflowing_tokens=False, padding="max_le │ │ 155 │ │ tokens = batch_encoding["input_ids"].to(self.device) │ │ ❱ 156 │ │ outputs = self.transformer(input_ids=tokens) │ │ 157 │ │ │ │ 158 │ │ z = outputs.last_hidden_state │ │ 159 │ │ return z │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling │ │ _clip.py:722 in forward │ │ │ │ 719 │ │ >>> last_hidden_state = outputs.last_hidden_state │ │ 720 │ │ >>> pooled_output = outputs.pooler_output # pooled (EOS token) states │ │ 721 │ │ ```""" │ │ ❱ 722 │ │ return self.text_model( │ │ 723 │ │ │ input_ids=input_ids, │ │ 724 │ │ │ attention_mask=attention_mask, │ │ 725 │ │ │ position_ids=position_ids, │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling │ │ _clip.py:643 in forward │ │ │ │ 640 │ │ │ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] │ │ 641 │ │ │ attention_mask = _expand_mask(attention_mask, hidden_states.dtype) │ │ 642 │ │ │ │ ❱ 643 │ │ encoder_outputs = self.encoder( │ │ 644 │ │ │ inputs_embeds=hidden_states, │ │ 645 │ │ │ attention_mask=attention_mask, │ │ 646 │ │ │ causal_attention_mask=causal_attention_mask, │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling │ │ _clip.py:574 in forward │ │ │ │ 571 │ │ │ │ │ causal_attention_mask, │ │ 572 │ │ │ │ ) │ │ 573 │ │ │ else: │ │ ❱ 574 │ │ │ │ layer_outputs = encoder_layer( │ │ 575 │ │ │ │ │ hidden_states, │ │ 576 │ │ │ │ │ attention_mask, │ │ 577 │ │ │ │ │ causal_attention_mask, │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling │ │ _clip.py:317 in forward │ │ │ │ 314 │ │ residual = hidden_states │ │ 315 │ │ │ │ 316 │ │ hidden_states = self.layer_norm1(hidden_states) │ │ ❱ 317 │ │ hidden_states, attn_weights = self.self_attn( │ │ 318 │ │ │ hidden_states=hidden_states, │ │ 319 │ │ │ attention_mask=attention_mask, │ │ 320 │ │ │ causal_attention_mask=causal_attention_mask, │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/estathop/anaconda3/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling │ │ _clip.py:257 in forward │ │ │ │ 254 │ │ │ │ 255 │ │ attn_probs = nn.functional.dropout(attn_weights, p=self.dropout, training=self.t │ │ 256 │ │ │ │ ❱ 257 │ │ attn_output = torch.bmm(attn_probs, value_states) │ │ 258 │ │ │ │ 259 │ │ if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim): │ │ 260 │ │ │ raise ValueError( │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: expected scalar type Half but found Float

estathop avatar Jan 23 '23 01:01 estathop

The instructions provided in the README assume you have an NVIDIA GPU with >18GB VRAM. It looks like you're running on a device with less memory.

There are a few ways to reduce the memory consumption:

  1. As recommended by @Burve and @SirBenet, you can reduce precision to fp16. Although, output quality may suffer.
  2. You can disable the use of ema weights. (use_ema and load_ema in configs/generate.yaml). This may also affect the quality of your outputs.
  3. You can alternatively try running the code through other pipelines that don't require as much GPU memory. Some examples are:

a. The HuggingFace space, that doesn't even need a GPU, it's using online cloud resources. b. ImaginAIry c. Diffusers

I'm working on an update to the README that outlines all these options.

holynski avatar Jan 23 '23 01:01 holynski

So it only works on 3090 Ti, 4090 Ti or workstation video cards worth 5k+ (or cloud) ? Should be written in big red print everywhere, including top of the readme. Before the images. Because it means only 0.1% of users can actually run in in their local environment. I just wasted 1hr installing, downloading its model, and troubleshooting for A1111.

VictorZakharov avatar Jan 29 '23 19:01 VictorZakharov