CogView2 icon indicating copy to clipboard operation
CogView2 copied to clipboard

ask for suggestions on "RuntimeError: CUDA out of memory. Tried to allocate..." error

Open undeathable opened this issue 2 years ago • 10 comments

Hi! I am new to Pytorch and would like to try this fantastic text-to-image project. I just tried to clone and run text2image.sh for only predict (not training) on different gpus, like 1 X v100-32G, or 2 X 3090Ti - 24G, but both pop up this "cuda out of memory" error. also tried reduce batch size and max-inference-batch-size both down to "1" , but still not working.

So, any suggestions on that issue expect move to higher performance GPUs like A100 or RTX A6000? like is it possible to change some configs to allow the machine to predict only on cpu? or provide smaller size of model "pt" file? or even modify some part of code/config to fully use 2 X 24G gpus (currently only 1 is used when prediction), etc

Thanks!

undeathable avatar Jul 15 '22 02:07 undeathable

@undeathable I made a dirty hack for GPUs with 12/24G vRAM: https://github.com/lkwq007/CogView2-low-vram

lkwq007 avatar Jul 15 '22 11:07 lkwq007

Try running with the --only-first-stage flag

Limbicnation avatar Jul 16 '22 09:07 Limbicnation

@undeathable I made a dirty hack for GPUs with 12/24G vRAM: https://github.com/lkwq007/CogView2-low-vram

Thanks! https://github.com/lkwq007/CogView2-low-vram works with --single-gpu perfectly solved my original issue! The only weird thing left for me is that when I try to run the shell script without "--single-gpu" flag which means "multiple GPUs (I got 2 X 24G RTX A5000)" mode, it pops up another error "RuntimeError: Expected all tensors to be on the same device, but found at least two devices" .

Any thoughts on that? like another configs or further fixes.

Thanks again!

undeathable avatar Jul 16 '22 15:07 undeathable

Try running with the --only-first-stage flag

u r right, with "--only-first-stage" flag it runs very successfully, but the output image does not look as perfect as the output from running the "full stages".

undeathable avatar Jul 16 '22 15:07 undeathable

Hi @undeathable, Thank you for your knowledge! I still do get an error bellow when running with --single-gpu on a RTX 3090Ti Any idea what I can do? Which Python and Pytorch version are you running in your Python Virtual Environment?

Save to: samples_sat_v0.2/0 Cat drinking coffe-07-19-02-36-24.jpeg Traceback (most recent call last): File "cogview2_text2image.py", line 249, in <module> main(args) File "cogview2_text2image.py", line 172, in main generate_continually(process, args.input_source) File "/home/*****/anaconda3/envs/cog-view2-low-vram/lib/python3.8/site-packages/SwissArmyTransformer/generation/utils.py", line 74, in generate_continually func(raw_text) File "cogview2_text2image.py", line 165, in process grid = make_grid(imgs, nrow=3, padding=0) File "/home/*****/anaconda3/envs/cog-view2-low-vram/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/*****//anaconda3/envs/cog-view2-low-vram/lib/python3.8/site-packages/torchvision/utils.py", line 76, in make_grid tensor = torch.stack(tensor, dim=0) RuntimeError: stack expects a non-empty TensorList

Limbicnation avatar Jul 19 '22 00:07 Limbicnation

@undeathable I made a dirty hack for GPUs with 12/24G vRAM: https://github.com/lkwq007/CogView2-low-vram

Thanks! https://github.com/lkwq007/CogView2-low-vram works with --single-gpu perfectly solved my original issue! The only weird thing left for me is that when I try to run the shell script without "--single-gpu" flag which means "multiple GPUs (I got 2 X 24G RTX A5000)" mode, it pops up another error "RuntimeError: Expected all tensors to be on the same device, but found at least two devices" .

Any thoughts on that? like another configs or further fixes.

Thanks again!

Works fine with pytorch1.11 (py3.7_cuda11.3_cudnn8.2.0_0) + 2xRTX3090 Do you install the apex? The hack is tested without fused_layer_norm from apex.

lkwq007 avatar Jul 19 '22 05:07 lkwq007

@lkwq007 Could you pleae briefly explain how to use without fused_layer_norm from apex . I have installed apex yet. Thank you!

Limbicnation avatar Jul 21 '22 17:07 Limbicnation

@lkwq007 Could you pleae briefly explain how to use without fused_layer_norm from apex . I have installed apex yet. Thank you!

@Limbicnation I suspect that "RuntimeError: Expected all tensors to be on the same device, but found at least two devices" on multiple GPUs might result from the apex. It seems that apex might not be related to your issues.

For your issues, you might need to check the value of iter_tokens and imgs.

lkwq007 avatar Jul 22 '22 17:07 lkwq007

@lkwq007 Thank you for your message!

I commented out this code from line 153 - 169:

# save if args.with_id: full_path = os.path.join(args.output_path, query_id) os.makedirs(full_path, exist_ok=True) save_multiple_images(imgs, full_path, False) else: prefix = raw_text.replace('/', '')[:20] full_path = timed_name(prefix, '.jpeg', args.output_path) # imgs = torch.cat(imgs, dim=0) print("\nSave to: ", full_path, flush=True) from PIL import Image from torchvision.utils import make_grid grid = make_grid(imgs, nrow=3, padding=0) # Add 0.5 after unnormalizing to [0, 255] to round to nearest integer ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to('cpu', torch.uint8).numpy() im = Image.fromarray(ndarr) im.save(full_path, quality=100, subsampling=0)

        However it runs, but just after direct super resolution I an error
        
        

image

File "/home/******/anaconda3/envs/cog-view2-low-vram/lib/python3.8/site-packages/torchvision/utils.py", line 128, in make_grid grid.narrow(1, y * height + padding, height - padding).narrow( # type: ignore[attr-defined] RuntimeError: The size of tensor a (3) must match the size of tensor b (256) at non-singleton dimension 2
Thanks!

Limbicnation avatar Jul 23 '22 04:07 Limbicnation

Hi I installed cuda pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 and with python 3.7

I still get the following Error: File "/home/*****/anaconda3/envs/cog-view2-low-vram2/lib/python3.7/site-packages/SwissArmyTransformer/ops/local_attention_function.py", line 5, in <module> from localAttention import (similar_forward,

Limbicnation avatar Jul 23 '22 05:07 Limbicnation