ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Default arg "cuda-malloc" causes CUDA error: operation not supported on GTX 960M GPU

Open wklchris opened this issue 1 year ago • 42 comments

I have confirmed that the default arg --cuda-malloc causes error on my computer. I must disable it by adding --disable-cuda-malloc to let ComfyUI work properly.

If I don't disable it, the following CUDA error will occur when I try to gerenated an image:

(venv) PS C:\Users\wklchris> python "${comfyuiDir}/main.py" --lowvram
Total VRAM 2048 MB, total RAM 8076 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce GTX 960M : cudaMallocAsync
Using pytorch cross attention
Adding extra search path checkpoints D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path configs D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path vae D:/Git-repos/stable-diffusion-webui\models/VAE
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/Lora
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/LyCORIS
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/ESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/RealESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/SwinIR
Adding extra search path embeddings D:/Git-repos/stable-diffusion-webui\embeddings
Adding extra search path hypernetworks D:/Git-repos/stable-diffusion-webui\models/hypernetworks
Adding extra search path controlnet D:/Git-repos/stable-diffusion-webui\models/ControlNet
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 0
making attention of type 'vanilla-pytorch' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-pytorch' with 512 in_channels
left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates'])
!!! Exception during processing !!!
Traceback (most recent call last):
  File "D:\Git-repos\ComfyUI\execution.py", line 145, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "D:\Git-repos\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "D:\Git-repos\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "D:\Git-repos\ComfyUI\nodes.py", line 1082, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "D:\Git-repos\ComfyUI\nodes.py", line 1052, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "D:\Git-repos\ComfyUI\comfy\sample.py", line 75, in sample
    comfy.model_management.load_model_gpu(model)
  File "D:\Git-repos\ComfyUI\comfy\model_management.py", line 298, in load_model_gpu
    accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\big_modeling.py", line 370, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 498, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 251, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\modeling.py", line 147, in set_module_tensor_to_device
    new_value = old_value.to(device)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I run ComfyUI on Windows with torch 2.0 and I use a GTX 960M card.


Questions & suggestions:

  • I have no knowledge of CUDA malloc; I just guess the above error is a hardware compatibility problem (correct me if I am wrong) because I am using an old graphic card 960M. Is there anyway ComfyUI can detect such incompatibility during launch and automatically disable this argument, just like enable lowvram when it knows the GPU has smal VRAM?

  • If there is little we can do during launch, I suggest that ComfyUI tells user to manually disable it in the terminal when encounters this error at runtime (if possible), or at least warns users in the argument description of --help. I am requesting so because when I checked the argument list with the --help option, I saw:

    --cuda-malloc         Enable cudaMallocAsync (enabled by default for torch 2.0 and up).
    

    The above description really looks like users should enable cuda malloc when torch 2.0 is installed; however, some users like me have to disable it even they have torch 2.0.

wklchris avatar Jul 19 '23 16:07 wklchris

Just to make sure, are you on the latest nvidia drivers?

comfyanonymous avatar Jul 19 '23 16:07 comfyanonymous

Just to make sure, are you on the latest nvidia drivers?

No, the latest is v536 and I am still staying on v531. I didn't update because I saw many complains that Nvidia driver after 531 may have slowdown issue (see vladmandic/automatic #1285) and it looks by now new drivers haven't solved the issue yet.

Is cuda-malloc something only supported by drivers after v531?

wklchris avatar Jul 19 '23 17:07 wklchris

531 should support it. if anyone else is running a GTX 9xx or older nvidia GPU has the same issue let me know so I know which GPUs are affected.

comfyanonymous avatar Jul 19 '23 17:07 comfyanonymous

https://github.com/comfyanonymous/ComfyUI/commit/799c08a4ce01feb9e5b4aae8fec4347f2259f9c4#diff-1eb25131bac2fdf60f5ac5d483edd7f75f6654d6eb927ebb2b2c68aa71ebc351R40

I added a list of GPUs not to enable cuda malloc on so if someone else has a similar issue with one I didn't put on the list let me know.

comfyanonymous avatar Jul 19 '23 18:07 comfyanonymous

I get the same problem, GTX 750 Ti

MaddyAurora avatar Jul 19 '23 19:07 MaddyAurora

It should be auto disabled on the 750 Ti now: https://github.com/comfyanonymous/ComfyUI/commit/39c58b227fa265f65c96ef133c580e790e64d8e7

comfyanonymous avatar Jul 19 '23 19:07 comfyanonymous

Same issue with the current code on a GeForce GTX 960, disabling the setting fixed it.

Namnodorel avatar Jul 22 '23 12:07 Namnodorel

Should be disabled on the regular GTX 960 now: https://github.com/comfyanonymous/ComfyUI/commit/85a8900a148c881914ed16900108f08fd26981c1

comfyanonymous avatar Jul 22 '23 15:07 comfyanonymous

I have the same problem on a GTX 970

andres885 avatar Aug 03 '23 23:08 andres885

had to "--disable-cuda-malloc" as well, running on "NVIDIA GeForce GT 840M 2 GB"

lilshippo avatar Aug 06 '23 01:08 lilshippo

Should be fixed: https://github.com/comfyanonymous/ComfyUI/commit/fc71cf656e1f26e6577c0a211b7460fc078b0c39

comfyanonymous avatar Aug 06 '23 01:08 comfyanonymous

image The lower version of the Nvidia driver cannot use cudaMallocAsync. It is also necessary to check the driver version before using it.

haoqiangyu avatar Aug 08 '23 07:08 haoqiangyu

I have confirmed that the default arg --cuda-malloc causes error on my computer. I must disable it by adding --disable-cuda-malloc to let ComfyUI work properly.

If I don't disable it, the following CUDA error will occur when I try to gerenated an image:

(venv) PS C:\Users\wklchris> python "${comfyuiDir}/main.py" --lowvram
Total VRAM 2048 MB, total RAM 8076 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce GTX 960M : cudaMallocAsync
Using pytorch cross attention
Adding extra search path checkpoints D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path configs D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path vae D:/Git-repos/stable-diffusion-webui\models/VAE
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/Lora
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/LyCORIS
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/ESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/RealESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/SwinIR
Adding extra search path embeddings D:/Git-repos/stable-diffusion-webui\embeddings
Adding extra search path hypernetworks D:/Git-repos/stable-diffusion-webui\models/hypernetworks
Adding extra search path controlnet D:/Git-repos/stable-diffusion-webui\models/ControlNet
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 0
making attention of type 'vanilla-pytorch' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-pytorch' with 512 in_channels
left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates'])
!!! Exception during processing !!!
Traceback (most recent call last):
  File "D:\Git-repos\ComfyUI\execution.py", line 145, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "D:\Git-repos\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "D:\Git-repos\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "D:\Git-repos\ComfyUI\nodes.py", line 1082, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "D:\Git-repos\ComfyUI\nodes.py", line 1052, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "D:\Git-repos\ComfyUI\comfy\sample.py", line 75, in sample
    comfy.model_management.load_model_gpu(model)
  File "D:\Git-repos\ComfyUI\comfy\model_management.py", line 298, in load_model_gpu
    accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\big_modeling.py", line 370, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 498, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 251, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\modeling.py", line 147, in set_module_tensor_to_device
    new_value = old_value.to(device)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I run ComfyUI on Windows with torch 2.0 and I use a GTX 960M card.

Questions & suggestions:

  • I have no knowledge of CUDA malloc; I just guess the above error is a hardware compatibility problem (correct me if I am wrong) because I am using an old graphic card 960M. Is there anyway ComfyUI can detect such incompatibility during launch and automatically disable this argument, just like enable lowvram when it knows the GPU has smal VRAM?

  • If there is little we can do during launch, I suggest that ComfyUI tells user to manually disable it in the terminal when encounters this error at runtime (if possible), or at least warns users in the argument description of --help. I am requesting so because when I checked the argument list with the --help option, I saw:

    --cuda-malloc         Enable cudaMallocAsync (enabled by default for torch 2.0 and up).
    

    The above description really looks like users should enable cuda malloc when torch 2.0 is installed; however, some users like me have to disable it even they have torch 2.0.

What is the program that i can type this to disable cuda-malloc thank you.

Chillnear avatar Aug 08 '23 15:08 Chillnear

I also have this issue with my NVIDIA GeForce GTX 950. Adding --disable-cuda-malloc worked for me.

Veranith avatar Aug 11 '23 14:08 Veranith

I have Nvidia Geforce GTX 960M but not working for me too.

Where do I have to add or run this command? I tried in Powershell but got error --disable-cuda-malloc

At line:1 char:3

  • --disable-cuda-malloc
  • ~ Missing expression after unary operator '--'. At line:1 char:3
  • --disable-cuda-malloc

Unexpected token 'disable-cuda-malloc' in expression or statement. + CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException + FullyQualifiedErrorId : MissingExpressionAfterOperator

lew1s avatar Aug 12 '23 12:08 lew1s

I have confirmed that the default arg --cuda-malloc causes error on my computer. I must disable it by adding --disable-cuda-malloc to let ComfyUI work properly.

If I don't disable it, the following CUDA error will occur when I try to gerenated an image:

(venv) PS C:\Users\wklchris> python "${comfyuiDir}/main.py" --lowvram
Total VRAM 2048 MB, total RAM 8076 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce GTX 960M : cudaMallocAsync
Using pytorch cross attention
Adding extra search path checkpoints D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path configs D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path vae D:/Git-repos/stable-diffusion-webui\models/VAE
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/Lora
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/LyCORIS
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/ESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/RealESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/SwinIR
Adding extra search path embeddings D:/Git-repos/stable-diffusion-webui\embeddings
Adding extra search path hypernetworks D:/Git-repos/stable-diffusion-webui\models/hypernetworks
Adding extra search path controlnet D:/Git-repos/stable-diffusion-webui\models/ControlNet
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 0
making attention of type 'vanilla-pytorch' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-pytorch' with 512 in_channels
left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates'])
!!! Exception during processing !!!
Traceback (most recent call last):
  File "D:\Git-repos\ComfyUI\execution.py", line 145, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "D:\Git-repos\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "D:\Git-repos\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "D:\Git-repos\ComfyUI\nodes.py", line 1082, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "D:\Git-repos\ComfyUI\nodes.py", line 1052, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "D:\Git-repos\ComfyUI\comfy\sample.py", line 75, in sample
    comfy.model_management.load_model_gpu(model)
  File "D:\Git-repos\ComfyUI\comfy\model_management.py", line 298, in load_model_gpu
    accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\big_modeling.py", line 370, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 498, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 251, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\modeling.py", line 147, in set_module_tensor_to_device
    new_value = old_value.to(device)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I run ComfyUI on Windows with torch 2.0 and I use a GTX 960M card.

Questions & suggestions:

  • I have no knowledge of CUDA malloc; I just guess the above error is a hardware compatibility problem (correct me if I am wrong) because I am using an old graphic card 960M. Is there anyway ComfyUI can detect such incompatibility during launch and automatically disable this argument, just like enable lowvram when it knows the GPU has smal VRAM?

  • If there is little we can do during launch, I suggest that ComfyUI tells user to manually disable it in the terminal when encounters this error at runtime (if possible), or at least warns users in the argument description of --help. I am requesting so because when I checked the argument list with the --help option, I saw:

    --cuda-malloc         Enable cudaMallocAsync (enabled by default for torch 2.0 and up).
    

    The above description really looks like users should enable cuda malloc when torch 2.0 is installed; however, some users like me have to disable it even they have torch 2.0.

How did you disable cuda malloc, where?

lew1s avatar Aug 12 '23 12:08 lew1s

I have confirmed that the default arg --cuda-malloc causes error on my computer. I must disable it by adding --disable-cuda-malloc to let ComfyUI work properly. If I don't disable it, the following CUDA error will occur when I try to gerenated an image:

(venv) PS C:\Users\wklchris> python "${comfyuiDir}/main.py" --lowvram
Total VRAM 2048 MB, total RAM 8076 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce GTX 960M : cudaMallocAsync
Using pytorch cross attention
Adding extra search path checkpoints D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path configs D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path vae D:/Git-repos/stable-diffusion-webui\models/VAE
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/Lora
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/LyCORIS
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/ESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/RealESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/SwinIR
Adding extra search path embeddings D:/Git-repos/stable-diffusion-webui\embeddings
Adding extra search path hypernetworks D:/Git-repos/stable-diffusion-webui\models/hypernetworks
Adding extra search path controlnet D:/Git-repos/stable-diffusion-webui\models/ControlNet
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 0
making attention of type 'vanilla-pytorch' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-pytorch' with 512 in_channels
left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates'])
!!! Exception during processing !!!
Traceback (most recent call last):
  File "D:\Git-repos\ComfyUI\execution.py", line 145, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "D:\Git-repos\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "D:\Git-repos\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "D:\Git-repos\ComfyUI\nodes.py", line 1082, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "D:\Git-repos\ComfyUI\nodes.py", line 1052, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "D:\Git-repos\ComfyUI\comfy\sample.py", line 75, in sample
    comfy.model_management.load_model_gpu(model)
  File "D:\Git-repos\ComfyUI\comfy\model_management.py", line 298, in load_model_gpu
    accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\big_modeling.py", line 370, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 498, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 251, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\modeling.py", line 147, in set_module_tensor_to_device
    new_value = old_value.to(device)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I run ComfyUI on Windows with torch 2.0 and I use a GTX 960M card. Questions & suggestions:

  • I have no knowledge of CUDA malloc; I just guess the above error is a hardware compatibility problem (correct me if I am wrong) because I am using an old graphic card 960M. Is there anyway ComfyUI can detect such incompatibility during launch and automatically disable this argument, just like enable lowvram when it knows the GPU has smal VRAM?

  • If there is little we can do during launch, I suggest that ComfyUI tells user to manually disable it in the terminal when encounters this error at runtime (if possible), or at least warns users in the argument description of --help. I am requesting so because when I checked the argument list with the --help option, I saw:

    --cuda-malloc         Enable cudaMallocAsync (enabled by default for torch 2.0 and up).
    

    The above description really looks like users should enable cuda malloc when torch 2.0 is installed; however, some users like me have to disable it even they have torch 2.0.

How did you disable cuda malloc, where?

I had exactly the same and just added the nvidia card to the blacklist in cuda_malloc.py and it works.

Topzie avatar Aug 13 '23 19:08 Topzie

Can you tell me which string you added to the blacklist so I can add it?

comfyanonymous avatar Aug 13 '23 19:08 comfyanonymous

Can you tell me which string you added to the blacklist so I can add it?

Oops, I got you wrong. I thought you were the OP, deleted my post 😅 It's GeForce GTX 1650

Topzie avatar Aug 13 '23 20:08 Topzie

I have GeForece 960M and added it to the blocklist in the cuda_malloc.py file as you can see here.

https://www.screencast.com/t/PqOwuMCM

But I am getting still the same error:

`Error occurred when executing KSampler:

CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1206, in sample return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1176, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "C:\Apps\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 75, in sample comfy.model_management.load_model_gpu(model) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 307, in load_model_gpu accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev) File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\big_modeling.py", line 373, in dispatch_model attach_align_device_hook_on_blocks( File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\hooks.py", line 523, in attach_align_device_hook_on_blocks add_hook_to_module(module, hook) File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module module = hook.init_hook(module) File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\hooks.py", line 253, in init_hook set_module_tensor_to_device(module, name, self.execution_device) File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\utils\modeling.py", line 165, in set_module_tensor_to_device new_value = old_value.to(device)`

lew1s avatar Aug 14 '23 07:08 lew1s

in run_nvidia_gpu.bat at line end "-s ComfyUI\main.py --windows-standalone-build" add --disable-cuda-malloc

CamelliasW avatar Aug 24 '23 09:08 CamelliasW

@ comfyanonymous its my GPU run at Vmware and has the same issue, is it in compatible list? should i add it in black list?

pls refer to below in detail. image https://github.com/lllyasviel/Fooocus/issues/188

hanetyb avatar Aug 27 '23 00:08 hanetyb

i did the change, but file is auto update to date once i run fooocus and auto remove the disable parameter .

could you please advise how i should disable the sync function or update? its not functional to set file as ready only.

in run_nvidia_gpu.bat at line end "-s ComfyUI\main.py --windows-standalone-build" add --disable-cuda-malloc

hanetyb avatar Aug 29 '23 07:08 hanetyb

I believe I'm having the same issue on my 1060 Max-Q. Tried reinstalling torch, no change. I can only generate with --disable-cuda-malloc.

MGerckens avatar Sep 25 '23 00:09 MGerckens

Same issue on the 1660ti (Notebook Ed.)

Edit: I had recently re-installed my OS and my drivers were mistakenly outdated, after updating it seems to have fixed this for me, my bad.

Terraphice avatar Oct 01 '23 02:10 Terraphice

I am also getting the same error and fixing it with the same solution on my Tesla P40. I didn't have any issues until I noticed the problem on 12/29/23. My old driver version was 528.89. I updated to the latest version 537.70. The problem was not fixed with a driver update. System still reports CUDA version 12.0 instead of 12.2 as listed on driver download page. The system is running in a VM if that means anything. I also tried to add the card to the blacklist in cuda_malloc.py but was still met with the same error with "Tesla P40" and "NVIDIA Tesla P40".

offroadguy56 avatar Dec 30 '23 16:12 offroadguy56

I also had to add "--disable-cuda-malloc", for my old 4 GB "NVIDIA GeForce GTX 850M" graphic card. (Driver Version: 546.33 CUDA Version: 12.3).

hkdemiralp avatar Jan 14 '24 18:01 hkdemiralp

In "NVIDIA Tesla M40 24G", "--disable-cuda-malloc" is also required ( Driver Version: 535.161.08)

BigYuanHead avatar Mar 24 '24 08:03 BigYuanHead

In "NVIDIA Tesla M40 24G", "--disable-cuda-malloc" is also required ( Driver Version: 550.67)

codejach avatar Mar 26 '24 23:03 codejach

In "NVIDIA L4 24G", "--disable-cuda-malloc" is also required ( Driver Version: 535.171.04)

efwfe avatar May 27 '24 05:05 efwfe