candle icon indicating copy to clipboard operation
candle copied to clipboard

Build fails on Maxwell GPU due to __dp4a undefined in quantized.cu

Open fishonamos opened this issue 4 months ago • 0 comments

I’m trying to build a Rust project locally that depends on candle-kernels on my laptop with an NVIDIA GeForce 940MX (Maxwell, compute capability 5.0). The build fails with errors like:


src/quantized.cu(1997): error: identifier "__dp4a" is undefined
...
18 errors detected in the compilation of "src/quantized.cu".

GPU: NVIDIA GeForce 940MX (GM107, compute capability 5.0) OS: Kali Linux (rolling) CUDA toolkit: 12.3 NVIDIA driver: 550.163.01 candle-kernels: v0.7.2

The error is caused by the use of the CUDA intrinsic __dp4a, which is only available on GPUs with compute capability 6.1+ (Pascal and newer). My GPU is compute 5.0, so this intrinsic is not available.

Questions: Is there a way to disable quantized kernels or the use of __dp4a for older GPUs? If not, could a feature flag or build option be added to support older hardware, or at least skip building quantized kernels on unsupported GPUs?

fishonamos avatar Jul 07 '25 14:07 fishonamos