apex icon indicating copy to clipboard operation
apex copied to clipboard

Cannot compile/build cuda_ext on H100

Open GuanhuaWang opened this issue 11 months ago • 0 comments

Describe the Bug

Try install on HGX-H100 nodes, pip install cannot enable build on cuda extensions like amp_C, etc.

Minimal Steps/Code to Reproduce the Bug

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

also tried with -e, not helpful.

Expected Behavior

compile and build cuda extensions successfully.

Environment

cuda 12.2, torch 2.2.1

My temporary fix

My temporary fix is comment out check_cuda_torch_binary_vs_bare_metal in setup.py which force cuda_extension to build.

GuanhuaWang avatar Mar 01 '24 23:03 GuanhuaWang