candle
candle copied to clipboard
Quantized Flux not working
hi. gettin an error. on my ADA RTX 4000 machine that supports BF16 and that runs Stable Diffusion just fine. I get an error on the quantized FLUX update.
running with no model specified or dev or schnell
cargo run --features cuda,cudnn --example flux -r -- --height 1024 --width 1024 --prompt "a rusty robot walking on a beach holding a small torch, the robot has the word "rust" written on it, high quality, 4k" --model dev
error
Tensor[[1, 256], u32, cuda:0]
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading is_u32_bf16
any ideas?
This error is most likely not due to the model itself but rather to the cuda setup. The bf16 kernels are predicated by the following line:
#if __CUDA_ARCH__ >= 800
...
#endif
This makes the kernels only available when the cuda arch set up by the nvcc compiler is above 8 so it's likely not the case in your setup. It would be interesting to see which value __CUDA_ARCH__ has in your case, as well as the output of the nvidia-smi --query-gpu=compute_cap --format=csv command.
this machine has 2 GPUs. When I run the Stable diffusion Examples it uses the ADA 4000 with a 8.9 compute cap.
$ nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
6.1
8.9
how do i see the value of CUDA_ARCH
That first gpu is most likely creating the issue, did you trying using CUDA_VISIBLE_DEVICES so that candle can only see the second gpu (if you're not familiar with it, it's not a candle specific thing so you can just google to find the way to use it).
When CUDA_VISIBLE_DEVICES is set to the correct device and i can see that the correct GPU is used in nvidia-smi -l 1 realtime monitoring it gets up to about 9GB of memory used and then the same error happens in the middle of the image process
Running `target/release/examples/flux --height 1024 --width 1024 --prompt 'a rusty robot walking on a beach holding a small torch, the robot has the word rust written on it, high quality, 4k' --quantized`
[[ 3, 9, 3, 9277, 63, 7567, 3214, 30, 3, 9, 2608,
3609, 3, 9, 422, 26037, 6, 8, 7567, 65, 8, 1448,
3, 9277, 1545, 30, 34, 6, 306, 463, 6, 314, 157,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0]]
Tensor[[1, 256], u32, cuda:0]
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading is_u32_bf16
Probably good to clean your target directory in case there are some PTX files that are cached and didn't rebuild after setting CUDA_VISIBLE_DEVICES.
I did a clean and its the same error. I can monitor it in nvidia-smi which card is being used. The older card simply runs out of memory right away.
not sure where to look next.
Hum seems weird that candle can use the older card if CUDA_VISIBLE_DEVICES points only at the new one, it's supposed to be handled in the cuda framework and so not something that candle could bypass. Maybe you're pointing at the wrong device somehow?
Another option would be in the code to point at the cuda device 1 rather than the cuda device 0.
Actually it is pointing to the right card. it's using the correct card. CUDA_VISIBLE_DEVICES works as it should, there is no problem there. it's using the correct card and crashing. again there is no problem in choosing the correct card. As I said above I can monitor it in nviida-smi and it is using the right card. On the correct card it still crashes with the error:
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading is_u32_bf16
just checking back in. no idea how to troubleshoot this. so it works on the A100 but still getting the same error on my ADA RTX 4000 with 20GB and compute cap 89
DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading is_u32_bf16
and on Mac M1 we get error
Error while loading function: "Function 'cast_f32_bf16' does not exist"))