voltaML-fast-stable-diffusion
voltaML-fast-stable-diffusion copied to clipboard
NameError: name 'loaded_model' is not defined, and FileNotFoundError: [Errno 2] No such file or directory: 'onnx/clip.onnx'
Hi,
Trying to run accelerated SD1.5 models, getting this issue Running on windows 11 WSL, with RTX 3070 8GB
CMD:
docker run --gpus=all -v C:\voltaml\engine/engine:/workspace/voltaML-fast-stable-diffusion/engine -v C:\voltaml\output/engine:/workspace/voltaML-fast-stable-diffusion/static/output -p 5003:5003 -it voltaml/volta_diffusion_webui:v0.2
172.17.0.1 - - [18/Dec/2022 13:15:21] "POST /voltaml/job HTTP/1.1" 500 -
Traceback (most recent call last):
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 661, in infer_trt
if loaded_model!=args.model_path:
NameError: name 'loaded_model' is not defined
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2548, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2528, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/workspace/voltaML-fast-stable-diffusion/app.py", line 88, in upload_file
pipeline_time = infer_trt(saving_path=saving_path,
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 664, in infer_trt
load_trt(saving_path, model, prompt, img_height, img_width, num_inference_steps)
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 599, in load_trt
trt_model.loadEngines(engine_dir, onnx_dir, args.onnx_opset,
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 279, in loadEngines
torch.onnx.export(model,
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/__init__.py", line 350, in export
return utils.export(
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 163, in export
_export(
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 1148, in _export
with torch.serialization._open_file_like(f, "wb") as opened_file:
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 211, in __init__
super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'onnx/clip.onnx'
You need to create a folder called "onnx"
where?
Please pull the new docker and try it again. The issue should be fixed.
I'm on v 03 what is new ?
I'm on v 03 what is new ?
Fixes: CFG now working for TensorRT, -1 seed, inference requests are now being handled smoothly
sorry for my English. I am on docker v 03 and there is the same error. What path needs to be created to fix this issue?
I get the same error trying voltaML for the first time. Is this a dead repo?
I get the same error trying voltaML for the first time. Is this a dead repo?
No, quite the opposite is true, you can take a look at the draft in Pull Requests tab. I will be merging that to the experimental branch today. TRT is still in the works when it comes to the new WebUI though but you can expect it soon.
@Stax124 I've been doing perf analysis of SD within the context of A1111 and found a 3X perf boost for 4090's doing basic 1 image batches at 512x512 euler_a sd v2.1. I had 13.5 it/s before and now get 39.5 it/s. This is on a fast i9-13900 with a 4090 on Ubuntu. This was from upgrading cuDNN from v8.5 to v8.7. I brought this up with the PyTorch team on github and they created a PR and will be fixing it soon if it hasn't already been merged into PyTorch 2.0.0. The best I got when I tried VoltaML was only 18 it/s and that was even after I upgraded cuDNN to v8.7 and pytorch to the nightly build. I found the problem with the error above. The undefined variable occurs only if you try a TRT generation before clicking accelerate and ?compiling? or whatever it does. But even then it still doesn't work. I have not bothered trying to debug it myself but given the poor SD perf I'm not sure if TRT is really going to be significantly better than what I already can do.
FYI, people on Windows report something closer to a 2X improvement with my changes and can't seem to get my easy to repro 39.5 number. I do have a dual boot setup but haven't bothered to find out what's wrong with Windows. I'm happy with my Ubuntu perf. FYI2, I've also discovered surprisingly that single thread CPU perf makes a huge difference for what I was expecting was mostly GPU processing. A 128 thread threadripper can't do a serial stream of 1 image at a time anywhere as fast as a 5.8 GHz Raptor Lake. I've posted the details of this elsewhere. I'd get a threadripper IF AND ONLY IF I was doing SD on the CPU. But I'm not.
If VoltaML does have some magic that provides a good perf speed up I'd be happy to help test it. Currently I can build the nightly Torch 2.0 and can use CUDA 12.0 if needed. You might consider added Euler_a support as I find it is faster than most?all? of the others, although I haven't tested all of them.
@aifartist We already found a patch for the 4090 and we are aware of the problems with Ada Lovelace architecture. Documentation for this bug will be available in our own docs. For now, we will help people that ask (or I might display a clear warning). With it, we got 49it/s (approx with xFormers).
Also, this bug is already patched on the experimental
branch, it just needed a folder created before it runs. I forgot to close the issue - I have a lot of work on my table - so thanks for bringing my attention to this issue.
On the last note, Yes - I believe in TRT if we can make it work with lower VRAM because it really brings a big performance uplift. If you want to test the new stuff that we are adding, please check the experimental
branch. I would like more people to help me or at least, give me their honest opinions.
Last - PyTorch supports most of the K-Diffusion samplers (Euler Ancestral included) with Karras sigmas, while TRT supports only diffusers without Karras sigmas (but we still have an Euler A, just a worse one)
Thanks for your interest in this project and have a nice rest of your day.
(Also if you want to chat with me or other devs, come to our discord: https://discord.com/invite/pY5SVyHmWm, I will happily chat about this topic with you)