cutlass
cutlass copied to clipboard
[BUG] Failing to build on MSVC due to call to _div128
Summary
https://github.com/pytorch/pytorch/pull/125204 Is failing on windows with:
2024-06-03T10:06:46.7711483Z C:/cb/pytorch_1000000000000/work/aten/src/ATen/../../../third_party/cutlass/include\cutlass/uint128.h(189): error: calling a __host__ function("_udiv128") from a __host__ __device__ function("cutlass::uint128_t::operator / const") is not allowed
2024-06-03T10:06:46.7712785Z
2024-06-03T10:06:46.7713366Z 1 error detected in the compilation of "C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/RowwiseScaledMM.cu".
2024-06-03T10:06:46.7715339Z C:/cb/pytorch_1000000000000/work/aten/src/ATen/../../../third_party/cutlass/include\cutlass/uint128.h(189): error: calling a __host__ function("_udiv128") from a __host__ __device__ function("cutlass::uint128_t::operator / const") is not allowed
I feel like there a few options here to fix but sure what is best, perhaps just not calling the intrinsic on windows?
@mhoemmen
@drisspg Thanks for reporting the issue! Does changing the operator% body to the following fix it?
/// Divide 128b operation by 64b operation yielding a 64b quotient
CUTLASS_HOST_DEVICE
uint64_t operator%(uint64_t const& divisor) const
{
uint64_t remainder{0};
#if defined(CUTLASS_UINT128_NATIVE)
remainder = uint64_t(native % divisor);
#elif defined(CUTLASS_INT128_ARITHMETIC_DIV) && ! defined (__CUDA_ARCH__)
// implemented using MSVC's arithmetic intrinsics
(void)_udiv128(hilo_.hi, hilo_.lo, divisor, &remainder);
#else
CUTLASS_UNUSED(divisor);
exception();
#endif
return remainder;
}
hey @mhoemmen sorry for such a late reply, trying a ci/cd run now with this fix:
https://github.com/pytorch/pytorch/pull/129723
@drisspg How did that turn out?
@mhoemmen We have a similar issue blocking the CUTLASS upgrade
@mhoemmen @hwu36 sounds line this is blocking PyTorch. Do we have a fix we can push out to unblock them?
@thakkarV I was hoping to hear back from them to see whether the proposed fix worked. I'll check the code and create a fix if it doesn't already exist.
@Skylion007 @drisspg can you please confirm if the above suggested fix works?
It did for that specific issue, but we have a similar issue regarding a different issue in the linked PR
2024-07-31T22:45:43.8464920Z C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\bin\randomtemp.exe C:/actions-runner/_work/pytorch/pytorch/build/win_tmp\bin\sccache.exe C:\PROGRA~1\NVIDIA~2\CUDA\v12.1\bin\nvcc.exe -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DFLASHATTENTION_DISABLE_ALIBI -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_MEM_EFF_ATTENTION -DUSE_MIMALLOC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_UCRT_LEGACY_INFINITY -Dtorch_cuda_EXPORTS -DTORCH_ASSERT_NO_OPERATORS -IC:\actions-runner\_work\pytorch\pytorch\build\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build -IC:\actions-runner\_work\pytorch\pytorch -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\benchmark\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\nlohmann -IC:\actions-runner\_work\pytorch\pytorch\third_party\mimalloc\include -IC:\actions-runner\_work\pytorch\pytorch\aten\src\THC -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\cuda -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\..\..\..\third_party\cutlass\include -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\..\..\..\third_party\cutlass\tools\util\include -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\.. -IC:\actions-runner\_work\pytorch\pytorch\c10\cuda\..\.. -IC:\actions-runner\_work\pytorch\pytorch\c10\.. -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api\include -isystem=C:\actions-runner\_work\pytorch\pytorch\build\third_party\gloo -isystem=C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\gloo -isystem=C:\actions-runner\_work\pytorch\pytorc
2024-07-31T22:45:43.8485761Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8486718Z if (
2024-07-31T22:45:43.8486900Z ^
2024-07-31T22:45:43.8487012Z
2024-07-31T22:45:43.8487770Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8488802Z if (
2024-07-31T22:45:43.8489056Z ^
2024-07-31T22:45:43.8489173Z
2024-07-31T22:45:43.8489933Z 2 errors detected in the compilation of "C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu".
2024-07-31T22:45:43.8490903Z cutlassB_bf16_aligned_k64_dropout.cu
2024-07-31T22:45:43.8491195Z Retry attempt: 1
2024-07-31T22:45:43.8492090Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8493044Z if (
2024-07-31T22:45:43.8493231Z ^
2024-07-31T22:45:43.8493333Z
2024-07-31T22:45:43.8494086Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8495038Z if (
2024-07-31T22:45:43.8495229Z ^
2024-07-31T22:45:43.8495330Z
2024-07-31T22:45:43.8496084Z 2 errors detected in the compilation of "C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu".
2024-07-31T22:45:43.8497053Z cutlassB_bf16_aligned_k64_dropout.cu
2024-07-31T22:45:43.8497342Z Retry attempt: 2
2024-07-31T22:45:43.8498222Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8499161Z if (
2024-07-31T22:45:43.8499350Z ^
2024-07-31T22:45:43.8499456Z
2024-07-31T22:45:43.8500217Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8501148Z if (
2024-07-31T22:45:43.8501335Z ^
2024-07-31T22:45:43.8501438Z
2024-07-31T22:45:43.8502200Z 2 errors detected in the compilation of "C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu".
2024-07-31T22:45:43.8503157Z cutlassB_bf16_aligned_k64_dropout.cu
2024-07-31T22:45:43.8503447Z Retry attempt: 3
2024-07-31T22:45:43.8504339Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8505273Z if (
2024-07-31T22:45:43.8505463Z ^
2024-07-31T22:45:43.8505571Z
2024-07-31T22:45:43.8506335Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8507273Z if (
2024-07-31T22:45:43.8507444Z ^
2024-07-31T22:45:43.8507554Z
2024-07-31T22:45:43.8508308Z 2 errors detected in the compilation of "C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu".
2024-07-31T22:45:43.8509271Z cutlassB_bf16_aligned_k64_dropout.cu
2024-07-31T22:45:44.6942536Z [7842/8447] Building CUDA object caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\transformers\cuda\mem_eff_attention\kernels\cutlassB_bf16_aligned_k128.cu.obj
2024-07-31T22:45:44.6943841Z FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k128.cu.obj
@Skylion007 It looks like PR https://github.com/NVIDIA/cutlass/pull/1679 might fix your newly reported issue.
FYI, PR #1679 has merged into the main branch.