apex
apex copied to clipboard
Cannot compile/build cuda_ext on H100
Describe the Bug
Try install on HGX-H100 nodes, pip install cannot enable build on cuda extensions like amp_C, etc.
Minimal Steps/Code to Reproduce the Bug
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
also tried with -e
, not helpful.
Expected Behavior
compile and build cuda extensions successfully.
Environment
cuda 12.2, torch 2.2.1
My temporary fix
My temporary fix is comment out check_cuda_torch_binary_vs_bare_metal in setup.py which force cuda_extension to build.