candle
candle copied to clipboard
Build fails on Maxwell GPU due to __dp4a undefined in quantized.cu
I’m trying to build a Rust project locally that depends on candle-kernels on my laptop with an NVIDIA GeForce 940MX (Maxwell, compute capability 5.0). The build fails with errors like:
src/quantized.cu(1997): error: identifier "__dp4a" is undefined
...
18 errors detected in the compilation of "src/quantized.cu".
GPU: NVIDIA GeForce 940MX (GM107, compute capability 5.0) OS: Kali Linux (rolling) CUDA toolkit: 12.3 NVIDIA driver: 550.163.01 candle-kernels: v0.7.2
The error is caused by the use of the CUDA intrinsic __dp4a, which is only available on GPUs with compute capability 6.1+ (Pascal and newer). My GPU is compute 5.0, so this intrinsic is not available.
Questions: Is there a way to disable quantized kernels or the use of __dp4a for older GPUs? If not, could a feature flag or build option be added to support older hardware, or at least skip building quantized kernels on unsupported GPUs?