ComfyUI KSampler fails

Thanks for all the work on this!

Wanted to report that after today's update, all KSampler nodes fail.

System info:

Windows 10.0.19045 Build 19045
NVIDIA GeForce RTX 3090 Ti (driver version 535.98)
Cuda compilation tools, release 11.8, V11.8.89

Let me know if you need any other details!

This is the error I get within the UI with the default workflow:

Error occurred when executing KSampler:

CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


File "E:\AI Experiments\ComfyUI\ComfyUI\execution.py", line 141, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "E:\AI Experiments\ComfyUI\ComfyUI\execution.py", line 75, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "E:\AI Experiments\ComfyUI\ComfyUI\execution.py", line 68, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "E:\AI Experiments\ComfyUI\ComfyUI\nodes.py", line 980, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
File "E:\AI Experiments\ComfyUI\ComfyUI\nodes.py", line 950, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "E:\AI Experiments\ComfyUI\ComfyUI\comfy\sample.py", line 88, in sample
samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar)
File "E:\AI Experiments\ComfyUI\ComfyUI\comfy\samplers.py", line 676, in sample
samples = getattr(k_diffusion_sampling, "sample_{}".format(self.sampler))(self.model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar)
File "E:\AI Experiments\ComfyUI\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI Experiments\ComfyUI\ComfyUI\comfy\k_diffusion\sampling.py", line 123, in sample_euler
gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.

And this is the error in the terminal:

!!! Exception during processing !!!
Traceback (most recent call last):
  File "E:\AI Experiments\ComfyUI\ComfyUI\execution.py", line 141, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "E:\AI Experiments\ComfyUI\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "E:\AI Experiments\ComfyUI\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "E:\AI Experiments\ComfyUI\ComfyUI\nodes.py", line 980, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "E:\AI Experiments\ComfyUI\ComfyUI\nodes.py", line 950, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "E:\AI Experiments\ComfyUI\ComfyUI\comfy\sample.py", line 88, in sample
    samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar)
  File "E:\AI Experiments\ComfyUI\ComfyUI\comfy\samplers.py", line 676, in sample
    samples = getattr(k_diffusion_sampling, "sample_{}".format(self.sampler))(self.model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar)
  File "E:\AI Experiments\ComfyUI\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\AI Experiments\ComfyUI\ComfyUI\comfy\k_diffusion\sampling.py", line 123, in sample_euler
    gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Prompt executed in 9.03 seconds
Exception in thread Thread-1 (prompt_worker):
Traceback (most recent call last):
  File "threading.py", line 1016, in _bootstrap_inner
  File "threading.py", line 953, in run
  File "E:\AI Experiments\ComfyUI\ComfyUI\main.py", line 48, in prompt_worker
    comfy.model_management.soft_empty_cache()
  File "E:\AI Experiments\ComfyUI\ComfyUI\comfy\model_management.py", line 442, in soft_empty_cache
    torch.cuda.empty_cache()
  File "E:\AI Experiments\ComfyUI\python_embeded\lib\site-packages\torch\cuda\memory.py", line 133, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Jun 24 '23 18:06 kctdfh

Does it happen again if you reboot?

Jun 24 '23 18:06 comfyanonymous

Yeah I tried that but it does continue after a reboot as well!

On Sat, Jun 24, 2023, 2:36 p.m. comfyanonymous @.***> wrote:

Does it happen again if you reboot?

— Reply to this email directly, view it on GitHub https://github.com/comfyanonymous/ComfyUI/issues/792#issuecomment-1605680201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG4JGD2EXWFDSD56C4ZNHGLXM4XTLANCNFSM6AAAAAAZSU2AFY . You are receiving this because you authored the thread.Message ID: @.***>

Jun 24 '23 20:06 kctdfh

What if you update or reinstall your driver or download a new standalone build? That sounds like something wrong with the driver or the pytorch itself.

Jun 24 '23 21:06 comfyanonymous

Hey so I wanted to make sure I have enough info on this before I got back to you.

I downloaded a new standalone build but it'd get me the same error. Now, if I update the driver, it works... for a bit. Maybe until I restart the computer (it wouldn't fail mid-session is what I mean). Then it won't work until a new driver update comes out, which I assume the install process of that does some purging which clears the issue.

I thought it was some other process using CUDA that doesn't play well with pytorch. I ran gpustat and basically uninstalled or disabled background permissions on almost all of the active processes. Restarted and I still get the error.

Are you sure it has nothing to do with torch.cuda.empty_cache()? Anything you suggest I try?

Jul 07 '23 00:07 kctdfh

Wanted to come back and report that this was due to the factory overclocking of my GPU. Downclocking the GPU with MSI Afterburner solves the issue! The issue appears in a bunch of other tensorflow based apps/frameworks as well

Oct 07 '23 15:10 kctdfh