cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[BUG] Failing to build on MSVC due to call to _div128

Open drisspg opened this issue 1 year ago • 3 comments

Summary

https://github.com/pytorch/pytorch/pull/125204 Is failing on windows with:

2024-06-03T10:06:46.7711483Z C:/cb/pytorch_1000000000000/work/aten/src/ATen/../../../third_party/cutlass/include\cutlass/uint128.h(189): error: calling a __host__ function("_udiv128") from a __host__ __device__ function("cutlass::uint128_t::operator / const") is not allowed
2024-06-03T10:06:46.7712785Z 
2024-06-03T10:06:46.7713366Z 1 error detected in the compilation of "C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/RowwiseScaledMM.cu".
2024-06-03T10:06:46.7715339Z C:/cb/pytorch_1000000000000/work/aten/src/ATen/../../../third_party/cutlass/include\cutlass/uint128.h(189): error: calling a __host__ function("_udiv128") from a __host__ __device__ function("cutlass::uint128_t::operator / const") is not allowed

I feel like there a few options here to fix but sure what is best, perhaps just not calling the intrinsic on windows?

drisspg avatar Jun 03 '24 16:06 drisspg

@mhoemmen

hwu36 avatar Jun 07 '24 14:06 hwu36

@drisspg Thanks for reporting the issue! Does changing the operator% body to the following fix it?

  /// Divide 128b operation by 64b operation yielding a 64b quotient
  CUTLASS_HOST_DEVICE
  uint64_t operator%(uint64_t const& divisor) const
  {
    uint64_t remainder{0};
#if defined(CUTLASS_UINT128_NATIVE)
    remainder = uint64_t(native % divisor);
#elif defined(CUTLASS_INT128_ARITHMETIC_DIV) && ! defined (__CUDA_ARCH__)
    // implemented using MSVC's arithmetic intrinsics
    (void)_udiv128(hilo_.hi, hilo_.lo, divisor, &remainder);
#else
    CUTLASS_UNUSED(divisor);
    exception();
#endif
    return remainder;
  }

mhoemmen avatar Jun 07 '24 18:06 mhoemmen

hey @mhoemmen sorry for such a late reply, trying a ci/cd run now with this fix:

https://github.com/pytorch/pytorch/pull/129723

drisspg avatar Jun 28 '24 00:06 drisspg

@drisspg How did that turn out?

mhoemmen avatar Jul 18 '24 19:07 mhoemmen

@mhoemmen We have a similar issue blocking the CUTLASS upgrade

Skylion007 avatar Aug 01 '24 14:08 Skylion007

@mhoemmen @hwu36 sounds line this is blocking PyTorch. Do we have a fix we can push out to unblock them?

thakkarV avatar Aug 01 '24 15:08 thakkarV

@thakkarV I was hoping to hear back from them to see whether the proposed fix worked. I'll check the code and create a fix if it doesn't already exist.

mhoemmen avatar Aug 02 '24 16:08 mhoemmen

@Skylion007 @drisspg can you please confirm if the above suggested fix works?

thakkarV avatar Aug 05 '24 14:08 thakkarV

It did for that specific issue, but we have a similar issue regarding a different issue in the linked PR

Skylion007 avatar Aug 05 '24 17:08 Skylion007

2024-07-31T22:45:43.8464920Z C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\bin\randomtemp.exe C:/actions-runner/_work/pytorch/pytorch/build/win_tmp\bin\sccache.exe C:\PROGRA~1\NVIDIA~2\CUDA\v12.1\bin\nvcc.exe -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DFLASHATTENTION_DISABLE_ALIBI -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_MEM_EFF_ATTENTION -DUSE_MIMALLOC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_UCRT_LEGACY_INFINITY -Dtorch_cuda_EXPORTS -DTORCH_ASSERT_NO_OPERATORS -IC:\actions-runner\_work\pytorch\pytorch\build\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build -IC:\actions-runner\_work\pytorch\pytorch -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\benchmark\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\nlohmann -IC:\actions-runner\_work\pytorch\pytorch\third_party\mimalloc\include -IC:\actions-runner\_work\pytorch\pytorch\aten\src\THC -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\cuda -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\..\..\..\third_party\cutlass\include -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\..\..\..\third_party\cutlass\tools\util\include -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\.. -IC:\actions-runner\_work\pytorch\pytorch\c10\cuda\..\.. -IC:\actions-runner\_work\pytorch\pytorch\c10\.. -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api\include -isystem=C:\actions-runner\_work\pytorch\pytorch\build\third_party\gloo -isystem=C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\gloo -isystem=C:\actions-runner\_work\pytorch\pytorc
2024-07-31T22:45:43.8485761Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8486718Z       if (
2024-07-31T22:45:43.8486900Z       ^
2024-07-31T22:45:43.8487012Z 
2024-07-31T22:45:43.8487770Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8488802Z       if (
2024-07-31T22:45:43.8489056Z       ^
2024-07-31T22:45:43.8489173Z 
2024-07-31T22:45:43.8489933Z 2 errors detected in the compilation of "C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu".
2024-07-31T22:45:43.8490903Z cutlassB_bf16_aligned_k64_dropout.cu
2024-07-31T22:45:43.8491195Z Retry attempt: 1
2024-07-31T22:45:43.8492090Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8493044Z       if (
2024-07-31T22:45:43.8493231Z       ^
2024-07-31T22:45:43.8493333Z 
2024-07-31T22:45:43.8494086Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8495038Z       if (
2024-07-31T22:45:43.8495229Z       ^
2024-07-31T22:45:43.8495330Z 
2024-07-31T22:45:43.8496084Z 2 errors detected in the compilation of "C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu".
2024-07-31T22:45:43.8497053Z cutlassB_bf16_aligned_k64_dropout.cu
2024-07-31T22:45:43.8497342Z Retry attempt: 2
2024-07-31T22:45:43.8498222Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8499161Z       if (
2024-07-31T22:45:43.8499350Z       ^
2024-07-31T22:45:43.8499456Z 
2024-07-31T22:45:43.8500217Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8501148Z       if (
2024-07-31T22:45:43.8501335Z       ^
2024-07-31T22:45:43.8501438Z 
2024-07-31T22:45:43.8502200Z 2 errors detected in the compilation of "C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu".
2024-07-31T22:45:43.8503157Z cutlassB_bf16_aligned_k64_dropout.cu
2024-07-31T22:45:43.8503447Z Retry attempt: 3
2024-07-31T22:45:43.8504339Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8505273Z       if (
2024-07-31T22:45:43.8505463Z       ^
2024-07-31T22:45:43.8505571Z 
2024-07-31T22:45:43.8506335Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/../../../third_party/cutlass/include\cutlass/functional.h(572): error: calling a __host__ function from a __host__ __device__ function is not allowed
2024-07-31T22:45:43.8507273Z       if (
2024-07-31T22:45:43.8507444Z       ^
2024-07-31T22:45:43.8507554Z 
2024-07-31T22:45:43.8508308Z 2 errors detected in the compilation of "C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu".
2024-07-31T22:45:43.8509271Z cutlassB_bf16_aligned_k64_dropout.cu
2024-07-31T22:45:44.6942536Z [7842/8447] Building CUDA object caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\transformers\cuda\mem_eff_attention\kernels\cutlassB_bf16_aligned_k128.cu.obj
2024-07-31T22:45:44.6943841Z FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k128.cu.obj 

Skylion007 avatar Aug 05 '24 17:08 Skylion007

@Skylion007 It looks like PR https://github.com/NVIDIA/cutlass/pull/1679 might fix your newly reported issue.

mhoemmen avatar Aug 05 '24 18:08 mhoemmen

FYI, PR #1679 has merged into the main branch.

mhoemmen avatar Aug 05 '24 21:08 mhoemmen