diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Still get fp32 wrapper error

Open G-force78 opened this issue 1 year ago • 4 comments

Describe the bug

Steps: 33% 500/1500 [09:06<17:56, 1.08s/it, loss=0.286, lr=1e-6]Traceback (most recent call last):

  File "train_dreambooth.py", line 822, in <module>
    main(args)
  File "train_dreambooth.py", line 805, in main
    save_weights(global_step)
  File "train_dreambooth.py", line 682, in save_weights
    text_enc_model = accelerator.unwrap_model(text_encoder, keep_fp32_wrapper=True)

Reproduction

Train and save part way.

Logs

No response

System Info

  • diffusers version: 0.9.0
  • Platform: Linux-5.10.147+-x86_64-with-glibc2.27
  • Python version: 3.8.16
  • PyTorch version (GPU?): 1.13.0+cu116 (True)
  • Huggingface_hub version: 0.11.1
  • Transformers version: 4.25.1
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

G-force78 avatar Jan 07 '23 10:01 G-force78

Can confirm that as per @ShivamShrirao's response in #180, you just need to update accelerate. Change %pip install -q accelerate==0.12.0 transformers ftfy bitsandbytes gradio natsort to %pip install -q accelerate transformers ftfy bitsandbytes gradio natsort

OMGhozlan avatar Jan 07 '23 15:01 OMGhozlan

so what should it be ? bitsandbytes or bitsandbytes 35? im getting crap results still , half hour ago

2blackbar avatar Jan 07 '23 21:01 2blackbar

Not sure if related but now get this error RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Version=11.6. Please reinstall the torchvision that matches your PyTorch install.

peceeded by this

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchvision 0.14.1+cu116 requires torch==1.13.1, but you have torch 1.13.0 which is incompatible. torchtext 0.14.1 requires torch==1.13.1, but you have torch 1.13.0 which is incompatible. torchaudio 0.13.1+cu116 requires torch==1.13.1, but you have torch 1.13.0 which is incompatible.

After uptading torchvision i now get this

tcmalloc: large alloc 1109270528 bytes == 0x372e2000 @ 0x7ffb30823615 0x5d6f4c 0x51edd1 0x51ef5b 0x4f750a 0x4997a2 0x55cd91 0x5d8941 0x4997a2 0x55cd91 0x5d8941 0x4997a2 0x55cd91 0x5d8941 0x4997a2 0x55cd91 0x5d8941 0x4997a2 0x55cd91 0x5d8941 0x4997a2 0x5d8868 0x4997a2 0x55cd91 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x4997a2 0x55cd91 0x5d8941 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. xformers 0.0.15.dev0+4c06c79.d20221205 requires torch==1.13, but you have torch 1.13.1 which is incompatible.

G-force78 avatar Jan 12 '23 11:01 G-force78

@G-force78 fixed in 8b1472ffd0c8e0144f9db797e545eb908a1831b9

ShivamShrirao avatar Jan 12 '23 11:01 ShivamShrirao