voltaML-fast-stable-diffusion
voltaML-fast-stable-diffusion copied to clipboard
TRT Inference Not Working [volta_trt_flash]
[E] 3: [executionContext.cpp::validateInputBindings::1831] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::validateInputBindings::1831, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [2,4,64,96] for bindings[0] exceed min ~ max range at index 3, maximum dimension in profile is 64, minimum dimension in profile is 64, but supplied dimension is 96.
Exception in thread Thread-87:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 544, in infer_trt
images = demo.infer(prompt, negative_prompt, args.height, args.width, verbose=args.verbose, seed=args.seed)
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 404, in infer
noise_pred = self.runEngine(self.unet_model_key, {"sample": sample_inp, "timestep": timestep_inp, "encoder_hidden_states": embeddings_inp})['latent']
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 271, in runEngine
return engine.infer(feed_dict, self.stream)
File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 108, in infer
raise ValueError(f"ERROR: inference failed.")
ValueError: ERROR: inference failed.
rtx4090 used original Dockerfile from the volta_trt_flash branch.
Please wait for some time. We are updating the branch and pushing a new docker,
Please try out our new docker.
Initially worked fine with 512x512 then wanted to generate an image with 512x768 but it fell apart with the same error.
[E] 3: [executionContext.cpp::validateInputBindings::1831] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::validateInputBindings::1831, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [1,4,96,64] for bindings[0] exceed min ~ max range at index 2, maximum dimension in profile is 64, minimum dimension in profile is 64, but supplied dimension is 96.
After this error, went back to 512x512 and got the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2548, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2528, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/workspace/voltaML-fast-stable-diffusion/app.py", line 88, in upload_file
pipeline_time = infer_trt(saving_path=saving_path,
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 541, in infer_trt
pipeline_time = demo.infer(prompt, negative_prompt, args.height, args.width, verbose=args.verbose, seed=args.seed)
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 401, in infer
noise_pred = self.runEngine(self.unet_model_key, {"sample": sample_inp, "timestep": timestep_inp, "encoder_hidden_states": embeddings_inp})['latent']
File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 269, in runEngine
return engine.infer(feed_dict, self.stream)
File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 108, in infer
raise ValueError(f"ERROR: inference failed.")
ValueError: ERROR: inference failed.
Not working anymore.
The engine files have not been compiled with dynamic shapes. So this is why you might be getting the error. Are you accelerating it through UI or through CLI?
Built an image using the Dockerfile and accelerated through the web UI.
Built an image using the Dockerfile and accelerated through the web UI.
We have enabled dynamic shapes. Please try now
Same error. Build an engine locally using following execution command.
python volta_accelerate.py \ --prompt 'a highly detailed matte painting of a man on a hill watching a rocket launch in the distance by studio ghibli, makoto shinkai, by artgerm, by wlop, by greg rutkowski, volumetric lighting, octane render, 4 k resolution, trending on artstation, masterpiece' \ --height 512 --width 512 \ --model-path 'runwayml/stable-diffusion-v1-5' \ --hf-token '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' \ --backend 'TRT' \ --output-dir 'static/output' \ -v --build-dynamic-shape
Got the following errors when I tried to generate images with different sizes for example 768x768 or 512x768.
[E] 3: [executionContext.cpp::validateInputBindings::1831] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::validateInputBindings::1831, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [2,4,96,96] for bindings[0] exceed min ~ max range at index 2, maximum dimension in profile is 64, minimum dimension in profile is 64, but supplied dimension is 96. )
Traceback (most recent call last): File "/media/vyro/vyro/MachineLearning/Volta-ML/voltaML-fast-stable-diffusion/volta_accelerate.py", line 688, in infer_trt pipeline_time = trt_model.infer(prompt, negative_prompt, args.height, args.width, guidance_scale=args.guidance_scale, verbose=args.verbose, seed=args.seed, output_dir=args.output_dir) File "/media/vyro/vyro/MachineLearning/Volta-ML/voltaML-fast-stable-diffusion/volta_accelerate.py", line 460, in infer noise_pred = self.runEngine(self.unet_model_key, {"sample": sample_inp, "timestep": timestep_inp, "encoder_hidden_states": embeddings_inp})['latent'] File "/media/vyro/vyro/MachineLearning/Volta-ML/voltaML-fast-stable-diffusion/volta_accelerate.py", line 324, in runEngine return engine.infer(feed_dict, self.stream) File "/media/vyro/vyro/MachineLearning/Volta-ML/voltaML-fast-stable-diffusion/utilities.py", line 108, in infer raise ValueError(f"ERROR: inference failed.") ValueError: ERROR: inference failed.