gpu4pyscf icon indicating copy to clipboard operation
gpu4pyscf copied to clipboard

[BUG] Execution fails with "failed in block_diag kernel" on Blackwell GPU (sm_120)

Open turnDeep opened this issue 3 months ago • 2 comments

1. Summary

Hello gpu4pyscf team,

I am encountering a persistent CUDA kernel failure when running calculations on a new NVIDIA Blackwell series GPU (RTX 5070 Ti, Compute Capability 12.0).

The software correctly identifies and calls the gpu4pyscf modules, but consistently fails with the error failed in block_diag kernel. It then falls back to a CPU calculation, which completes successfully. This issue occurs even when gpu4pyscf is compiled specifically for the sm_120 architecture. This suggests a potential incompatibility issue with the new Blackwell architecture in the underlying kernel code.

2. Environment Details

  • GPU: NVIDIA GeForce RTX 5070 Ti
  • GPU Architecture: Blackwell (sm_120)
  • NVIDIA Driver Version: [581.08]
  • CUDA Toolkit Version: 12.8 (inside the nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04 Docker container)
  • OS: Ubuntu 22.04 (inside Docker)
  • Python Version: 3.11
  • pyscf Version: 2.8.0
  • gpu4pyscf Version: Built from the latest main branch on GitHub.

3. Steps to Reproduce

  1. Build gpu4pyscf from source:

    • Cloned the repository: git clone https://github.com/pyscf/gpu4pyscf.git
    • Set the build environment to target only the Blackwell architecture:
      export CMAKE_CUDA_ARCHITECTURES="120"
      
    • Built the Python wheel: python3 -m build --wheel
    • Installed the generated .whl file in the Docker container.
  2. Run a calculation:

    • Execute a standard geometry optimization and frequency calculation, for example, on an ethanol molecule (CCO). The error appears during the initial SCF energy calculation step.

4. Observed Behavior

The calculation starts and correctly invokes gpu4pyscf, but then immediately fails and falls back to pyscf (CPU). The key log output is as follows:

******** <class 'gpu4pyscf.dft.rks.RKS'> ********
method = RKS
...
XC library gpu4pyscf.dft.libxc version 7.0.0 (CUDA)
...
   Method 1 failed: failed in block_diag kernel
   Attempting hybrid CPU-GPU approach...
...
   Method 2 failed: failed in block_diag kernel
   Falling back to CPU calculation...


******** <class 'pyscf.dft.rks.RKS'> ********
...
(The calculation then proceeds to completion using only the CPU)
...
Structure optimization (CPU)...
...
Hessian calculation (CPU)...

5. Expected Behavior

The SCF calculation and subsequent geometry optimization should execute successfully on the GPU without any kernel failures or fallbacks to the CPU.

6. Additional Context

I also tried building with CMAKE_CUDA_ARCHITECTURES="90;120", and the result was identical. The failure seems to be specific to the execution on the sm_120 architecture itself, not the build process.

Thank you for developing this great library. I'm happy to provide more logs or run any diagnostic tests if needed.

turnDeep avatar Sep 03 '25 13:09 turnDeep

Are there any other errors in the output? such as "named symbol not found"

sunqm avatar Sep 30 '25 16:09 sunqm

I met the same problem while using gpu4pyscf-cuda12x with the cuda13.0(RTX 5090D). Is it because cuda 13.x is not supported by gpu4pyscf currently?

SurpassIllusion avatar Nov 09 '25 05:11 SurpassIllusion