ComfyUI
ComfyUI copied to clipboard
VAEDecode error using --fp16-vae
using --fp16-vae there is no more vram spike at the decoding stage, and it is working for up to 640x640 resolution, but higher than that I get an error report while the vram stayed long below the vram limits.
Error occurred when executing VAEDecode:
CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
File "G:\Stable-Diffusion-Program\ComfyUI_windows_portable\ComfyUI\execution.py", line 145, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "G:\Stable-Diffusion-Program\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "G:\Stable-Diffusion-Program\ComfyUI_windows_portable\ComfyUI\execution.py", line 68, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "G:\Stable-Diffusion-Program\ComfyUI_windows_portable\ComfyUI\nodes.py", line 216, in decode return (vae.decode(samples["samples"]), ) File "G:\Stable-Diffusion-Program\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 576, in decode pixel_samples[x:x+batch_number] = torch.clamp((self.first_stage_model.decode(samples) + 1.0) / 2.0, min=0.0, max=1.0).cpu().float()