gpu4pyscf
gpu4pyscf copied to clipboard
[BUG] Execution fails with "failed in block_diag kernel" on Blackwell GPU (sm_120)
1. Summary
Hello gpu4pyscf team,
I am encountering a persistent CUDA kernel failure when running calculations on a new NVIDIA Blackwell series GPU (RTX 5070 Ti, Compute Capability 12.0).
The software correctly identifies and calls the gpu4pyscf modules, but consistently fails with the error failed in block_diag kernel. It then falls back to a CPU calculation, which completes successfully. This issue occurs even when gpu4pyscf is compiled specifically for the sm_120 architecture. This suggests a potential incompatibility issue with the new Blackwell architecture in the underlying kernel code.
2. Environment Details
- GPU: NVIDIA GeForce RTX 5070 Ti
- GPU Architecture: Blackwell (sm_120)
- NVIDIA Driver Version:
[581.08] - CUDA Toolkit Version: 12.8 (inside the
nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04Docker container) - OS: Ubuntu 22.04 (inside Docker)
- Python Version: 3.11
pyscfVersion: 2.8.0gpu4pyscfVersion: Built from the latestmainbranch on GitHub.
3. Steps to Reproduce
-
Build
gpu4pyscffrom source:- Cloned the repository:
git clone https://github.com/pyscf/gpu4pyscf.git - Set the build environment to target only the Blackwell architecture:
export CMAKE_CUDA_ARCHITECTURES="120" - Built the Python wheel:
python3 -m build --wheel - Installed the generated
.whlfile in the Docker container.
- Cloned the repository:
-
Run a calculation:
- Execute a standard geometry optimization and frequency calculation, for example, on an ethanol molecule (
CCO). The error appears during the initial SCF energy calculation step.
- Execute a standard geometry optimization and frequency calculation, for example, on an ethanol molecule (
4. Observed Behavior
The calculation starts and correctly invokes gpu4pyscf, but then immediately fails and falls back to pyscf (CPU). The key log output is as follows:
******** <class 'gpu4pyscf.dft.rks.RKS'> ********
method = RKS
...
XC library gpu4pyscf.dft.libxc version 7.0.0 (CUDA)
...
Method 1 failed: failed in block_diag kernel
Attempting hybrid CPU-GPU approach...
...
Method 2 failed: failed in block_diag kernel
Falling back to CPU calculation...
******** <class 'pyscf.dft.rks.RKS'> ********
...
(The calculation then proceeds to completion using only the CPU)
...
Structure optimization (CPU)...
...
Hessian calculation (CPU)...
5. Expected Behavior
The SCF calculation and subsequent geometry optimization should execute successfully on the GPU without any kernel failures or fallbacks to the CPU.
6. Additional Context
I also tried building with CMAKE_CUDA_ARCHITECTURES="90;120", and the result was identical. The failure seems to be specific to the execution on the sm_120 architecture itself, not the build process.
Thank you for developing this great library. I'm happy to provide more logs or run any diagnostic tests if needed.
Are there any other errors in the output? such as "named symbol not found"
I met the same problem while using gpu4pyscf-cuda12x with the cuda13.0(RTX 5090D). Is it because cuda 13.x is not supported by gpu4pyscf currently?