sd-webui-text2video icon indicating copy to clipboard operation
sd-webui-text2video copied to clipboard

[Feature Request]: Memory switcher CUDA / CPU / MPS

Open doricem opened this issue 1 year ago • 34 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Enable ability to change map for storage -- to fix issue with Mac / M1 machines.

Proposed workflow

(Something)

Additional information

See discussion thread for more info

doricem avatar Mar 20 '23 20:03 doricem

https://github.com/deforum-art/sd-webui-modelscope-text2video/commit/ab1c4e744bab82986055cd5402b396b0b7bd6336 should support it now. Try it out (by updating to the latest version)

kabachuha avatar Mar 20 '23 22:03 kabachuha

ab1c4e7 should support it now. Try it out (by updating to the latest version)

Thank you. However, every option — GPU 1/2, GPU, and CPU returns this.. Exception occured 'NoneType' object has no attribute 'cond_stage_model'

doricem avatar Mar 20 '23 22:03 doricem

I just restarted the webui again and now this is what I'm getting in the terminal..

Exception occured Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

doricem avatar Mar 20 '23 22:03 doricem

Could you provide the full commandline log since the start, please?

kabachuha avatar Mar 20 '23 23:03 kabachuha

Launching launch.py...
################################################################
Python 3.10.9 (main, Dec 15 2022, 17:11:09) [Clang 14.0.0 (clang-1400.0.29.202)]
Commit hash: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
Installing requirements for Web UI


Launching Web UI with arguments: --upcast-sampling --no-half-vae --use-cpu interrogate
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
No module 'xformers'. Proceeding without it.
==============================================================================
You are running torch 1.12.1.
The program is tested to work with torch 1.13.1.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
==============================================================================
Loading weights [cc6cb27103] from /Users/bsath/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.ckpt
Creating model from config: /Users/bsath/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (InvokeAI).
Textual inversion embeddings loaded(0): 
Model loaded in 2.6s (load weights from disk: 0.6s, create model: 0.5s, apply weights to model: 0.5s, apply half(): 0.4s, move model to device: 0.5s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 9.5s (import gradio: 1.1s, import ldm: 0.4s, other imports: 0.7s, load scripts: 0.5s, load SD checkpoint: 2.7s, create ui: 4.0s).
ModelScope text2video extension for auto1111 webui
Git commit: ab1c4e74 (Mon Mar 20 22:22:46 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Exception occured
Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.```

doricem avatar Mar 20 '23 23:03 doricem

yes same issue

Exception occurred: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

StableInquest avatar Mar 25 '23 03:03 StableInquest

@StableInquest do you have the longer stacktrace?

kabachuha avatar Mar 25 '23 16:03 kabachuha

sure no problem:

ModelScope text2video extension for auto1111 webui
Git commit: 9f9bd657 (Fri Mar 24 22:49:32 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Traceback (most recent call last):
  File "/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 74, in process
    pipe = setup_pipeline()
  File "/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 30, in setup_pipeline
    return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
  File "/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 105, in __init__
    torch.load(
  File "/stable-diffusion-webui/modules/safe.py", line 106, in load
    return load_with_extra(filename, extra_handler=global_extra_handler, *args, **kwargs)
  File "/stable-diffusion-webui/modules/safe.py", line 151, in load_with_extra
    return unsafe_torch_load(filename, *args, **kwargs)
  File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1049, in _load
    result = unpickler.load()
  File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1254, in load_binpersid
    self.append(self.persistent_load(pid))
  File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1019, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1001, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 152, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 136, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Exception occurred: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

StableInquest avatar Mar 25 '23 17:03 StableInquest

Thanks, that's better! I'll try playing around with the map location

kabachuha avatar Mar 25 '23 17:03 kabachuha

Thank you😃

StableInquest avatar Mar 25 '23 17:03 StableInquest

In case this is any help, my traceback. Same basic gist, but a little different in the line references:

ModelScope text2video extension for auto1111 webui Git commit: 67f75ac6 (Sat Mar 25 17:20:22 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) Traceback (most recent call last): File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 74, in process pipe = setup_pipeline() File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 30, in setup_pipeline return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v') File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 105, in init torch.load( File "/Users/jrittvo/git/stable-diffusion-webui/modules/safe.py", line 106, in load return load_with_extra(filename, extra_handler=global_extra_handler, *args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/modules/safe.py", line 151, in load_with_extra return unsafe_torch_load(filename, *args, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 810, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 1173, in _load result = unpickler.load() File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1213, in load dispatchkey[0] File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1254, in load_binpersid self.append(self.persistent_load(pid)) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 1143, in persistent_load typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location)) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 1117, in load_tensor wrap_storage=restore_location(storage, location), File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 218, in default_restore_location result = fn(storage, location) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 183, in _cuda_deserialize device = validate_cuda_device(location) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 167, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on a CUDA ' RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. Exception occurred: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

jrittvo avatar Mar 25 '23 19:03 jrittvo

I pushed a fix attempt Please, try it now

kabachuha avatar Mar 25 '23 20:03 kabachuha

Not yet, but the traceback is different now. The pipeline is still trying to use cuda. These are my startup arguments, in case they are relevant: export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate". Without the --skip-torch-cuda-test argument, the basic interface won't start at all on a Mac M1. Maybe that argument stops the setting of some sort of flag that the text2video pipeline is depending on to know what it should use?


ModelScope text2video extension for auto1111 webui Git commit: 2d523d38 (Sat Mar 25 20:23:55 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) Traceback (most recent call last): File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 74, in process pipe = setup_pipeline() File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 30, in setup_pipeline return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v') File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 140, in init self.autoencoder = AutoencoderKL( File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1557, in init self.init_from_ckpt(ckpt_path) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1574, in init_from_ckpt torch_gc() File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 34, in torch_gc torch.cuda.ipc_collect() # Clear PyTorch CUDA IPC resources File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 602, in ipc_collect _lazy_init() File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled Exception occurred: Torch not compiled with CUDA enabled

This is with CPU (Low VRAM) selected, but the traceback is the same regardless of what I select there, and with or without keep pipe in memory selected.

jrittvo avatar Mar 25 '23 20:03 jrittvo

At least now it occurs for the VAE and not for the diffusion model

The problem is the same, will push a second fix soon

kabachuha avatar Mar 25 '23 20:03 kabachuha

Pushed, check it now

kabachuha avatar Mar 25 '23 21:03 kabachuha

I was just wondering if the settings on the t2v tab, up top for model and vae, are correctly ignored? With the t2v specific model files in their own special location, they don't appear in the choices for the tab.

jrittvo avatar Mar 25 '23 21:03 jrittvo

No, they are not ignored. It sends them to the main device, and it's visible in the CLI that the VAE is being halved (but some of options, like the initial map location, default to cpu on MacOS)

kabachuha avatar Mar 25 '23 21:03 kabachuha

Different traceback after your last push. Seems to be getting further along though before it errors out . . .

ModelScope text2video extension for auto1111 webui Git commit: 9383bb12 (Sat Mar 25 21:00:08 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 159, in process samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 231, in infer y, zero_y = self.preprocess(prompt, n_prompt) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 370, in preprocess text_emb = self.clip_encoder(prompt) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1150, in forward z = self.encode_with_transformer(tokens.to(self.device)) File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled Exception occurred: Torch not compiled with CUDA enabled

jrittvo avatar Mar 25 '23 21:03 jrittvo

So t2v is basically an alternate pipeline that uses the same models and vaes as regular text2img or img2img? If so, that makes the requests for adding LoRA, etc. kind of reasonable.

jrittvo avatar Mar 25 '23 21:03 jrittvo

They are similar, but not quite much. I.e. Stable Diffusion uses Unet2D with spacial attention and ModelScope uses Unet3D with spacial and temporal attention. So making LoRAs is definitely possible, but will require their modification

kabachuha avatar Mar 25 '23 21:03 kabachuha

Another push

kabachuha avatar Mar 25 '23 21:03 kabachuha

Just grabbed it, restarted Auto1111 and the traceback seems unchanged.

Here is the full log:

Launching Web UI with arguments: --skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled No module 'xformers'. Proceeding without it. Civitai Helper: Get Custom Model Folder Civitai Helper: Load setting from: /Users/jrittvo/git/stable-diffusion-webui/extensions/Stable-Diffusion-Webui-Civitai-Helper/setting.json Additional Network extension not installed, Only hijack built-in lora LoCon Extension hijack built-in lora successfully [AddNet] Updating model hashes... 100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 107.71it/s] [AddNet] Updating model hashes... 100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 108.17it/s] Hypernetwork-MonkeyPatch-Extension found! Loading weights [00a704a840] from /Users/jrittvo/git/stable-diffusion-webui/models/Stable-diffusion/MyMerge.safetensors Creating model from config: /Users/jrittvo/git/stable-diffusion-webui/configs/v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Loading VAE weights specified in settings: /Users/jrittvo/git/stable-diffusion-webui/models/VAE/vae-ft-mse-840000-ema-pruned.pt Applying sub-quadratic cross attention optimization. Textual inversion embeddings loaded(4): DeepNegative-1.75t, MicroMini, povBJ-3.0, redBJ Model loaded in 2.5s (load weights from disk: 0.2s, create model: 0.5s, apply weights to model: 0.9s, apply half(): 0.3s, move model to device: 0.5s). Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 5.8s (import torch: 0.7s, import gradio: 0.6s, import ldm: 0.1s, other imports: 0.7s, list extensions: 0.2s, load scripts: 0.7s, load SD checkpoint: 2.5s, create ui: 0.3s). ModelScope text2video extension for auto1111 webui Git commit: 9383bb12 (Sat Mar 25 21:00:08 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 159, in process samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 231, in infer y, zero_y = self.preprocess(prompt, n_prompt) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 370, in preprocess text_emb = self.clip_encoder(prompt) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1150, in forward z = self.encode_with_transformer(tokens.to(self.device)) File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled Exception occurred: Torch not compiled with CUDA enabled

jrittvo avatar Mar 25 '23 21:03 jrittvo

Looks like I didn't have the latest commit. Sorry. Here is the trace back with the latest:

ModelScope text2video extension for auto1111 webui Git commit: f532d344 (Sat Mar 25 21:25:53 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 159, in process samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 235, in infer torch_gc() File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 33, in torch_gc torch.cuda.ipc_collect() # Clear PyTorch CUDA IPC resources File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 602, in ipc_collect _lazy_init() File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled Exception occurred: Torch not compiled with CUDA enabled

jrittvo avatar Mar 25 '23 21:03 jrittvo

My browser seems to be picking up cached pages here. Need to clear it and restart it.

jrittvo avatar Mar 25 '23 21:03 jrittvo

torch_gc was duplicated, now they use the same implementation with cuda check

Pushed, try launching it now

kabachuha avatar Mar 25 '23 21:03 kabachuha

Tried the newest version and crashed it:

ModelScope text2video extension for auto1111 webui Git commit: fdf507c7 (Sat Mar 25 21:39:34 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0007, device='mps:0') tensor(1.0015, device='mps:0') loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/0aa643d0-625a-11ed-b319-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280xf32>' and 'tensor<1280xf16>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s). zsh: abort ./webui.sh (base) philbuck@PhilsMacStudio sd % /opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Philbuck84 avatar Mar 25 '23 21:03 Philbuck84

Getting the traceback now I seem to get when the ANE goes wild and runs out of memory. Just checked that it wasn't zombied . . .

ModelScope text2video extension for auto1111 webui Git commit: fdf507c7 (Sat Mar 25 21:39:34 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0005, device='mps:0') tensor(0.9997, device='mps:0') loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280xf32>' and 'tensor<1280xf16>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s). zsh: abort ./webui.sh jrittvo@M1PRO SD WebUI % /opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

jrittvo avatar Mar 25 '23 21:03 jrittvo

Disabled halving on mps, try now

kabachuha avatar Mar 25 '23 21:03 kabachuha

ModelScope text2video extension for auto1111 webui Git commit: d2c5e33c (Sat Mar 25 21:52:20 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0016, device='mps:0') tensor(1.0020, device='mps:0') DDIM sampling: 0%| | 0/31 [00:02<?, ?it/s] Traceback (most recent call last): | 0/31 [00:00<?, ?it/s] File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 159, in process samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 236, in infer x0 = self.diffusion.ddim_sample_loop( File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1513, in ddim_sample_loop xt = self.ddim_sample(xt, t, model, model_kwargs, clamp, File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1418, in ddim_sample _, _, _, x0 = self.p_mean_variance(xt, t, model, model_kwargs, clamp, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1359, in p_mean_variance y_out = model(xt, self._scale_timesteps(t), **model_kwargs[0]) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 348, in forward x = self._forward_single(block, x, e, context, time_rel_pos_bias, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 411, in _forward_single x = self._forward_single(block, x, e, context, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 390, in _forward_single x = module(x, e, self.batch) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 876, in forward return self._forward(x, emb, batch_size) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 902, in _forward h = self.temopral_conv(h) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1099, in forward x = self.conv1(x) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 613, in forward return self._conv_forward(input, self.weight, self.bias) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 608, in _conv_forward return F.conv3d( RuntimeError: Conv3D is not supported on MPS Exception occurred: Conv3D is not supported on MPS

This sounds ugly . . . Are you up against something that is not built into the OS yet?

jrittvo avatar Mar 25 '23 21:03 jrittvo

Well, sounds like this then 🤷‍♀️

Will have to when until the WebUI switches to PyTorch 2 (and if PyTorch2 has the conv3d feature at all)

kabachuha avatar Mar 25 '23 22:03 kabachuha