sd-webui-text2video
sd-webui-text2video copied to clipboard
[Feature Request]: Memory switcher CUDA / CPU / MPS
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
Enable ability to change map for storage -- to fix issue with Mac / M1 machines.
Proposed workflow
(Something)
Additional information
See discussion thread for more info
https://github.com/deforum-art/sd-webui-modelscope-text2video/commit/ab1c4e744bab82986055cd5402b396b0b7bd6336 should support it now. Try it out (by updating to the latest version)
ab1c4e7 should support it now. Try it out (by updating to the latest version)
Thank you.
However, every option — GPU 1/2, GPU, and CPU returns this..
Exception occured 'NoneType' object has no attribute 'cond_stage_model'
I just restarted the webui again and now this is what I'm getting in the terminal..
Exception occured Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Could you provide the full commandline log since the start, please?
Launching launch.py...
################################################################
Python 3.10.9 (main, Dec 15 2022, 17:11:09) [Clang 14.0.0 (clang-1400.0.29.202)]
Commit hash: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
Installing requirements for Web UI
Launching Web UI with arguments: --upcast-sampling --no-half-vae --use-cpu interrogate
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
No module 'xformers'. Proceeding without it.
==============================================================================
You are running torch 1.12.1.
The program is tested to work with torch 1.13.1.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.
Use --skip-version-check commandline argument to disable this check.
==============================================================================
Loading weights [cc6cb27103] from /Users/bsath/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.ckpt
Creating model from config: /Users/bsath/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (InvokeAI).
Textual inversion embeddings loaded(0):
Model loaded in 2.6s (load weights from disk: 0.6s, create model: 0.5s, apply weights to model: 0.5s, apply half(): 0.4s, move model to device: 0.5s).
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 9.5s (import gradio: 1.1s, import ldm: 0.4s, other imports: 0.7s, load scripts: 0.5s, load SD checkpoint: 2.7s, create ui: 4.0s).
ModelScope text2video extension for auto1111 webui
Git commit: ab1c4e74 (Mon Mar 20 22:22:46 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Exception occured
Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.```
yes same issue
Exception occurred: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
@StableInquest do you have the longer stacktrace?
sure no problem:
ModelScope text2video extension for auto1111 webui
Git commit: 9f9bd657 (Fri Mar 24 22:49:32 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Traceback (most recent call last):
File "/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 74, in process
pipe = setup_pipeline()
File "/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 30, in setup_pipeline
return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
File "/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 105, in __init__
torch.load(
File "/stable-diffusion-webui/modules/safe.py", line 106, in load
return load_with_extra(filename, extra_handler=global_extra_handler, *args, **kwargs)
File "/stable-diffusion-webui/modules/safe.py", line 151, in load_with_extra
return unsafe_torch_load(filename, *args, **kwargs)
File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 712, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1049, in _load
result = unpickler.load()
File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1213, in load
dispatch[key[0]](self)
File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1254, in load_binpersid
self.append(self.persistent_load(pid))
File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1019, in persistent_load
load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1001, in load_tensor
wrap_storage=restore_location(storage, location),
File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 175, in default_restore_location
result = fn(storage, location)
File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 152, in _cuda_deserialize
device = validate_cuda_device(location)
File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 136, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Exception occurred: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Thanks, that's better! I'll try playing around with the map location
Thank you😃
In case this is any help, my traceback. Same basic gist, but a little different in the line references:
ModelScope text2video extension for auto1111 webui Git commit: 67f75ac6 (Sat Mar 25 17:20:22 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) Traceback (most recent call last): File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 74, in process pipe = setup_pipeline() File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 30, in setup_pipeline return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v') File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 105, in init torch.load( File "/Users/jrittvo/git/stable-diffusion-webui/modules/safe.py", line 106, in load return load_with_extra(filename, extra_handler=global_extra_handler, *args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/modules/safe.py", line 151, in load_with_extra return unsafe_torch_load(filename, *args, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 810, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 1173, in _load result = unpickler.load() File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1213, in load dispatchkey[0] File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1254, in load_binpersid self.append(self.persistent_load(pid)) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 1143, in persistent_load typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location)) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 1117, in load_tensor wrap_storage=restore_location(storage, location), File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 218, in default_restore_location result = fn(storage, location) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 183, in _cuda_deserialize device = validate_cuda_device(location) File "/opt/homebrew/lib/python3.10/site-packages/torch/serialization.py", line 167, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on a CUDA ' RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. Exception occurred: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
I pushed a fix attempt Please, try it now
Not yet, but the traceback is different now. The pipeline is still trying to use cuda. These are my startup arguments, in case they are relevant: export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate"
. Without the --skip-torch-cuda-test
argument, the basic interface won't start at all on a Mac M1. Maybe that argument stops the setting of some sort of flag that the text2video
pipeline is depending on to know what it should use?
ModelScope text2video extension for auto1111 webui Git commit: 2d523d38 (Sat Mar 25 20:23:55 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) Traceback (most recent call last): File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 74, in process pipe = setup_pipeline() File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 30, in setup_pipeline return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v') File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 140, in init self.autoencoder = AutoencoderKL( File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1557, in init self.init_from_ckpt(ckpt_path) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1574, in init_from_ckpt torch_gc() File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 34, in torch_gc torch.cuda.ipc_collect() # Clear PyTorch CUDA IPC resources File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 602, in ipc_collect _lazy_init() File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled Exception occurred: Torch not compiled with CUDA enabled
This is with CPU (Low VRAM)
selected, but the traceback is the same regardless of what I select there, and with or without keep pipe in memory
selected.
At least now it occurs for the VAE and not for the diffusion model
The problem is the same, will push a second fix soon
Pushed, check it now
I was just wondering if the settings on the t2v tab, up top for model and vae, are correctly ignored? With the t2v specific model files in their own special location, they don't appear in the choices for the tab.
No, they are not ignored. It sends them to the main device, and it's visible in the CLI that the VAE is being halved (but some of options, like the initial map location, default to cpu on MacOS)
Different traceback after your last push. Seems to be getting further along though before it errors out . . .
ModelScope text2video extension for auto1111 webui Git commit: 9383bb12 (Sat Mar 25 21:00:08 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 159, in process samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 231, in infer y, zero_y = self.preprocess(prompt, n_prompt) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 370, in preprocess text_emb = self.clip_encoder(prompt) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1150, in forward z = self.encode_with_transformer(tokens.to(self.device)) File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled Exception occurred: Torch not compiled with CUDA enabled
So t2v is basically an alternate pipeline that uses the same models and vaes as regular text2img or img2img? If so, that makes the requests for adding LoRA, etc. kind of reasonable.
They are similar, but not quite much. I.e. Stable Diffusion uses Unet2D with spacial attention and ModelScope uses Unet3D with spacial and temporal attention. So making LoRAs is definitely possible, but will require their modification
Another push
Just grabbed it, restarted Auto1111 and the traceback seems unchanged.
Here is the full log:
Launching Web UI with arguments: --skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled No module 'xformers'. Proceeding without it. Civitai Helper: Get Custom Model Folder Civitai Helper: Load setting from: /Users/jrittvo/git/stable-diffusion-webui/extensions/Stable-Diffusion-Webui-Civitai-Helper/setting.json Additional Network extension not installed, Only hijack built-in lora LoCon Extension hijack built-in lora successfully [AddNet] Updating model hashes... 100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 107.71it/s] [AddNet] Updating model hashes... 100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 108.17it/s] Hypernetwork-MonkeyPatch-Extension found! Loading weights [00a704a840] from /Users/jrittvo/git/stable-diffusion-webui/models/Stable-diffusion/MyMerge.safetensors Creating model from config: /Users/jrittvo/git/stable-diffusion-webui/configs/v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Loading VAE weights specified in settings: /Users/jrittvo/git/stable-diffusion-webui/models/VAE/vae-ft-mse-840000-ema-pruned.pt Applying sub-quadratic cross attention optimization. Textual inversion embeddings loaded(4): DeepNegative-1.75t, MicroMini, povBJ-3.0, redBJ Model loaded in 2.5s (load weights from disk: 0.2s, create model: 0.5s, apply weights to model: 0.9s, apply half(): 0.3s, move model to device: 0.5s). Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True
in launch()
.
Startup time: 5.8s (import torch: 0.7s, import gradio: 0.6s, import ldm: 0.1s, other imports: 0.7s, list extensions: 0.2s, load scripts: 0.7s, load SD checkpoint: 2.5s, create ui: 0.3s).
ModelScope text2video extension for auto1111 webui
Git commit: 9383bb12 (Sat Mar 25 21:00:08 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device mps
Working in txt2vid mode
0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 159, in process
samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale,
File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 231, in infer
y, zero_y = self.preprocess(prompt, n_prompt)
File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 370, in preprocess
text_emb = self.clip_encoder(prompt)
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1150, in forward
z = self.encode_with_transformer(tokens.to(self.device))
File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Exception occurred: Torch not compiled with CUDA enabled
Looks like I didn't have the latest commit. Sorry. Here is the trace back with the latest:
ModelScope text2video extension for auto1111 webui Git commit: f532d344 (Sat Mar 25 21:25:53 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 159, in process samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 235, in infer torch_gc() File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 33, in torch_gc torch.cuda.ipc_collect() # Clear PyTorch CUDA IPC resources File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 602, in ipc_collect _lazy_init() File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled Exception occurred: Torch not compiled with CUDA enabled
My browser seems to be picking up cached pages here. Need to clear it and restart it.
torch_gc was duplicated, now they use the same implementation with cuda check
Pushed, try launching it now
Tried the newest version and crashed it:
ModelScope text2video extension for auto1111 webui Git commit: fdf507c7 (Sat Mar 25 21:39:34 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0007, device='mps:0') tensor(1.0015, device='mps:0') loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/0aa643d0-625a-11ed-b319-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280xf32>' and 'tensor<1280xf16>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s). zsh: abort ./webui.sh (base) philbuck@PhilsMacStudio sd % /opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Getting the traceback now I seem to get when the ANE goes wild and runs out of memory. Just checked that it wasn't zombied . . .
ModelScope text2video extension for auto1111 webui Git commit: fdf507c7 (Sat Mar 25 21:39:34 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0005, device='mps:0') tensor(0.9997, device='mps:0') loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280xf32>' and 'tensor<1280xf16>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s). zsh: abort ./webui.sh jrittvo@M1PRO SD WebUI % /opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Disabled halving on mps, try now
ModelScope text2video extension for auto1111 webui Git commit: d2c5e33c (Sat Mar 25 21:52:20 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) device mps Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0016, device='mps:0') tensor(1.0020, device='mps:0') DDIM sampling: 0%| | 0/31 [00:02<?, ?it/s] Traceback (most recent call last): | 0/31 [00:00<?, ?it/s] File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 159, in process samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 236, in infer x0 = self.diffusion.ddim_sample_loop( File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1513, in ddim_sample_loop xt = self.ddim_sample(xt, t, model, model_kwargs, clamp, File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1418, in ddim_sample _, _, _, x0 = self.p_mean_variance(xt, t, model, model_kwargs, clamp, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1359, in p_mean_variance y_out = model(xt, self._scale_timesteps(t), **model_kwargs[0]) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 348, in forward x = self._forward_single(block, x, e, context, time_rel_pos_bias, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 411, in _forward_single x = self._forward_single(block, x, e, context, File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 390, in _forward_single x = module(x, e, self.batch) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 876, in forward return self._forward(x, emb, batch_size) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 902, in _forward h = self.temopral_conv(h) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/jrittvo/git/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_model.py", line 1099, in forward x = self.conv1(x) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 613, in forward return self._conv_forward(input, self.weight, self.bias) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 608, in _conv_forward return F.conv3d( RuntimeError: Conv3D is not supported on MPS Exception occurred: Conv3D is not supported on MPS
This sounds ugly . . . Are you up against something that is not built into the OS yet?
Well, sounds like this then 🤷♀️
Will have to when until the WebUI switches to PyTorch 2 (and if PyTorch2 has the conv3d feature at all)