DALLE-models icon indicating copy to clipboard operation
DALLE-models copied to clipboard

Inference in Colab fails

Open PrithivirajDamodaran opened this issue 3 years ago • 8 comments

Follow the message (added some print statements to debug and removed clear_output) - Please advise

chosen_model: https://www.dropbox.com/s/8mmgnromwoilpfm/16L_64HD_8H_512I_128T_cc12m_cc3m_3E.pt?dl=1 folder_ /content/outputs/Cucumber_on_a_brown_wooden_chair/ Traceback (most recent call last): File "/content/dalle-pytorch-pretrained/DALLE-pytorch/generate.py", line 18, in <module> from dalle_pytorch import DiscreteVAE, OpenAIDiscreteVAE, VQGanVAE, DALLE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/__init__.py", line 1, in <module> from dalle_pytorch.dalle_pytorch import DALLE, CLIP, DiscreteVAE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/dalle_pytorch.py", line 11, in <module> from dalle_pytorch.vae import OpenAIDiscreteVAE, VQGanVAE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/vae.py", line 14, in <module> **from taming.models.vqgan import VQModel, GumbelVQ** ImportError: cannot import name 'GumbelVQ' from 'taming.models.vqgan' (/usr/local/lib/python3.7/dist-packages/taming/models/vqgan.py)

PrithivirajDamodaran avatar Aug 27 '21 15:08 PrithivirajDamodaran

Fixed-- retry.

johnpaulbin avatar Sep 12 '21 04:09 johnpaulbin

Screenshot 2021-09-14 at 10 01 28 AM

Now I getting a different issue file/image not found. Under '/content/dalle-pytorch-pretrained/' there is no folder by the name DALLE-pytorch

PrithivirajDamodaran avatar Sep 14 '21 04:09 PrithivirajDamodaran

Screenshot 2021-09-14 at 10 01 28 AM

Now I getting a different issue file/image not found. Under '/content/dalle-pytorch-pretrained/' there is no folder by the name DALLE-pytorch

The issue is in the !wget command in 2nd code block(2 Install required dependencies.), line 36 => !wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/, the /content/ at end is a directory, change it to /content/0.14.3.zip. It solves the above issue.

After that there are new issues:

  File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/attention.py", line 362, in forward
    out = self.attn_fn(q, k, v, attn_mask = attn_mask, key_padding_mask = key_pad_mask)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/deepspeed/ops/sparse_attention/sparse_self_attention.py", line 126, in forward
    assert query.dtype == torch.half, "sparse attention only supports training in fp16 currently, please file a github issue if you need fp32 support"
AssertionError: sparse attention only supports training in fp16 currently, please file a github issue if you need fp32 support
Finished generating images, attempting to display results...

And using deepspeed (replacing python to deepspeed and adding the --deepspeed, --fp16 args), by changing the code block 4 (4 Try out the model.), line 25 to 28 to:

25|if chosen_model not in allow:
26|  !deepspeed /content/dalle-pytorch-pretrained/DALLE-pytorch/generate.py --dalle_path=$checkpoint_path --taming --text="$text" --num_images=$num_images --batch_size=$batch_size --outputs_dir="$_folder" --deepspeed --fp16; wait;
27|else:
28|  !deepspeed /content/dalle-pytorch-pretrained/DALLE-pytorch/generate.py --dalle_path=$checkpoint_path --taming --text="$text" --num_images=$num_images --batch_size=$batch_size --outputs_dir="$_folder" --bpe_path variety.bpe --deepspeed --fp16; wait;

won't help:

generate.py: error: unrecognized arguments: --local_rank=0 --deepspeed --fp16

the problem is that --local_rank=0 arg passed somewhere.

Edit: Update: after solving the local rank issue and fp16 attention issue (rough fix to the generate.py file (added a dummy local_rank parser), and in attention.py manually converting to fp16 and back to orignal(x.dtype)), a new issue arises: ImportError: cannot import name 'MatMul' from 'deepspeed.ops.sparse_attention' (/usr/local/lib/python3.7/dist-packages/deepspeed/ops/sparse_attention/__init__.py)

Vbansal21 avatar Oct 04 '21 05:10 Vbansal21

Screenshot 2021-09-14 at 10 01 28 AM

Now I getting a different issue file/image not found. Under '/content/dalle-pytorch-pretrained/' there is no folder by the name DALLE-pytorch

@johnpaulbin Sorry not trying to be a pest, please advice on this issue

PrithivirajDamodaran avatar Oct 10 '21 07:10 PrithivirajDamodaran

Hi, I think this issue is due to this line in the second code cell in the Collab notebook which fails (silently, as for some reason the output is cleared later in the cell!):

!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/

It seems you need to specify the full path to the output file for wget, like this:

!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/0.14.3.zip

I don't know who has access to that notebook, but if it could be updated that would be great.

jonathanfrawley avatar Oct 25 '21 07:10 jonathanfrawley

Hi, I think this issue is due to this line in the second code cell in the Collab notebook which fails (silently, as for some reason the output is cleared later in the cell!):

!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/

It seems you need to specify the full path to the output file for wget, like this:

!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/0.14.3.zip

I don't know who has access to that notebook, but if it could be updated that would be great.

Sure that will solve this issue, but there are new issue after that. I've mentioned those above.It is a matmul import error of deepspeed.

Vbansal21 avatar Oct 25 '21 08:10 Vbansal21

Follow the message (added some print statements to debug and removed clear_output) - Please advise

chosen_model: https://www.dropbox.com/s/8mmgnromwoilpfm/16L_64HD_8H_512I_128T_cc12m_cc3m_3E.pt?dl=1 folder_ /content/outputs/Cucumber_on_a_brown_wooden_chair/ Traceback (most recent call last): File "/content/dalle-pytorch-pretrained/DALLE-pytorch/generate.py", line 18, in <module> from dalle_pytorch import DiscreteVAE, OpenAIDiscreteVAE, VQGanVAE, DALLE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/__init__.py", line 1, in <module> from dalle_pytorch.dalle_pytorch import DALLE, CLIP, DiscreteVAE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/dalle_pytorch.py", line 11, in <module> from dalle_pytorch.vae import OpenAIDiscreteVAE, VQGanVAE File "/content/dalle-pytorch-pretrained/DALLE-pytorch/dalle_pytorch/vae.py", line 14, in <module> **from taming.models.vqgan import VQModel, GumbelVQ** ImportError: cannot import name 'GumbelVQ' from 'taming.models.vqgan' (/usr/local/lib/python3.7/dist-packages/taming/models/vqgan.py)

@johnpaulbin @PrithivirajDamodaran can you please share how this issue was resolved? I'm facing on my GPU

hamdjalil avatar Dec 12 '21 06:12 hamdjalil

On the DeepSpeed Sparse Attention doc page, there's this:

Note: Currently, DeepSpeed Sparse Attention can be used only on NVIDIA V100 or A100 GPUs using Torch >= 1.6 and CUDA 10.1, 10.2, 11.0, or 11.1.

I have access to a v100 and ran the notebook on it (after adding the fixes) but I encountered the same issue. I ran it on CUDA 11.4 rather than 11.1 so that may be an issue. Colab is on 11.1.

Is is possible to run the notebook on an older version of DeepSpeed?

Cyberes avatar Dec 25 '21 03:12 Cyberes