cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11

Open levicki opened this issue 1 year ago • 22 comments

Describe the bug I initially reported this issue to xformers since xformers build was failing for me without realizing error was in CUTLASS submodule. After some back and forth and more testing on my end I realized the issue seems to be with CUTLASS 3.5.0.

Steps/Code to reproduce bug

  1. Install Visual Studio 2022 17.11.0 with C++ Desktop Development workload
  2. Install CUDA toolkit 12.6
  3. git clone https://github.com/NVIDIA/cutlass
  4. cd cutlass
  5. git checkout v3.5.0
  6. cmake-gui
  7. Select VS 2022
  8. Select x64
  9. Leave native compiler
  10. Click Configure
  11. Click Generate
  12. Click Open project
  13. Select Release
  14. Click Build

Expected behavior Build should succeed, it is failing on this (please ignore C:/BUILD/xformers prefix -- the same compilation errors happen from within Visual Studio build):

C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): warning C4346: 'SharedStorage': dependent name is not a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: prefix the qualified-id with 'typename' to indicate a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: the template instantiation context (the oldest one first) is
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(60): note: while compiling class template partial specialization 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(124): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(133): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage::PipelineStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): error C2061: syntax error: identifier 'SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C3646: 'math_wg_order': unknown override specifier
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int

Note that there might be other build errors as well, this was just the first place where building a project failed. It seems as if there might be some compiler issue with latest Visual Studio update?

Environment details (please complete the following information):

  • Environment location: Bare-metal

Additional context cl.exe Version 19.41.34120 for x64

levicki avatar Aug 21 '24 12:08 levicki

Your setup.py is using -std=c++17 for CXX options — MSVC syntax is -std:c++17 or /std:c++17, using GNU syntax leads to a warning about unrecognized compiler option (and probably compilation without C++17 support). Also, -O3 doesn't exist for MSVC.

levicki avatar Aug 21 '24 16:08 levicki

The culprit is CUDA 12.6 — I can build with CUDA 12.4.1 just fine.

levicki avatar Aug 21 '24 17:08 levicki

NVIDIA bug ID #4820029.

levicki avatar Aug 22 '24 16:08 levicki

tracking. Does 3.5.1 also fail with the same issue?

thakkarV avatar Aug 22 '24 16:08 thakkarV

tracking. Does 3.5.1 also fail with the same issue?

@thakkarV I don't see a tag for 3.5.1 and it's not in releases yet?

levicki avatar Aug 23 '24 01:08 levicki

Main is 3.5.1. We will tag soon

thakkarV avatar Aug 23 '24 01:08 thakkarV

Main is 3.5.1. We will tag soon

@thakkarV Hopefully not before this issue is root-caused and at least worked around?

levicki avatar Aug 23 '24 10:08 levicki

It appears to be a CUDA toolkit issue. If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too

thakkarV avatar Aug 23 '24 10:08 thakkarV

If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too

If you mean with CUDA 12.6, can you give repro steps for some minimal build that triggers it so I don't have to run the full build?

Even better if you can isolate just relevant code part which causes compiler errors so I can try to build just that from the developer command prompt.

EDIT: If I remember correctly I tried with main as well, didn't make any difference.

levicki avatar Aug 23 '24 10:08 levicki

Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)

egortech avatar Sep 21 '24 23:09 egortech

Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)

I asked on the ticket, no response yet from engineering team.

levicki avatar Sep 29 '24 10:09 levicki

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Oct 29 '24 11:10 github-actions[bot]

Closed because NVIDIA is apparently too lazy to fix it, what with resting on their laurels, it's a full time job.

levicki avatar Oct 29 '24 14:10 levicki

Just to update everyone that this seems to be CUDA compiler issue, and no — it's not fixed in CUDA 12.8.

levicki avatar Jan 27 '25 10:01 levicki

Hi, Just wanted to follow-up on this. This is blocking the build of some components of xFormers on Windows - is there a way to do a workaround in the cutlass code maybe, until this is fixed on the compiler side? cc @hwu36 maybe?

danthe3rd avatar Jan 31 '25 17:01 danthe3rd

@d-k-b

hwu36 avatar Jan 31 '25 17:01 hwu36

Hi, Just wanted to follow-up on this. This is blocking the build of some components of xFormers on Windows - is there a way to do a workaround in the cutlass code maybe, until this is fixed on the compiler side? cc @hwu36 maybe?

Not just xformers, onnxruntime was blocked too. Not sure if they found a workaround.

levicki avatar Jan 31 '25 21:01 levicki

We have fixed a number of Windows / MSVC issues in the past couple months. CUDA 12.8 + CUTLASS 3.7+ should now compile properly. The issue originally seen in this bug was fixed in v3.7. However, given the "flakiness" of Windows environments, I was not able to verify the exact reproduction flow as shown above, the cmake-gui has never worked in my VM environment properly. So I'm posting my simplified flow here for anyone to try out.

Start with Win 10/11 (with latest updates) or Win Server 2019+
Install Git, Python3, and CMake.
Install Visual Studio 2019+, and all updates.
Install CUDA 12.8
Open Visual Studio Development PowerShell Window
> Clone CUTLASS, checkout v3.7.0 or higher revision
> cmake . -Bbuild -DCMAKE_CUDA_COMPILER="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\nvcc.exe" -G "Visual Studio 16 2019" -A x64 -T host=x64 -DCMAKE_SUPPRESS_REGENERATION=ON
> cmake --build build --config Release -j $((Get-CimInstance -ClassName Win32_Processor).NumberOfCores) -- /verbosity:minimal

Note, at the time of posting, CUDA 12.8 installer hangs while trying to install to the latest VS 2022 extensions. If that happens to you, you need to roll back to an older version of Visual Studio 2022 temporarily until that is sorted out.

Update: It appears the installation issue is with VS 2022 17.13 Previews 3+, so rolling back to the last stable VS install should work.

d-k-b avatar Feb 20 '25 20:02 d-k-b

posting my simplified flow here for anyone to try out

Not to come across as argumentative, but:

  1. What can we do when we don't control the build process such as when compiling xformers or onnxruntime where CUTLASS is a dependency and is managed by their build process?
  2. What exactly is the "fix" here? 3.7.0? Something newer? Defines you're setitng? I tried 3.7.0 like 3 weeks ago and it didn't compile for me using CUDA 12.8 and VS 2022

Finally, you aren't using my issue repro steps -- if they are wrong, broken, or unsupported you should outright say so and explain why the compile is failing at SharedStorage bit.

levicki avatar Feb 20 '25 22:02 levicki

Not to come across as argumentative, but:

Not at all, we are all just trying to get this to work 😄

What can we do when we don't control the build process such as when compiling xformers or onnxruntime where CUTLASS is a dependency and is managed by their build process?

There are a number of moving targets here. There was an actual code issue described in this bug, which was fixed in 3.5.1 iirc, but along the way there were some breakages due to code that was incompatible with various versions of MSVC. Which leads into your next question ...

What exactly is the "fix" here? 3.7.0? Something newer? Defines you're setting? I tried 3.7.0 like 3 weeks ago and it didn't compile for me using CUDA 12.8 and VS 2022

If the build processes are compiling CUDA code successfully and applying the correct CUTLASS flags, then v3.7 should have the necessary fixes for the bug mentioned in the description and for the MSVC syntax fixes to be able to build CUTLASS successfully.

However, I was not able to reproduce a successful build environment using the exact steps shown in the description. The cmake-gui flow did not produce a Visual Studio solution file that was configured correctly. If I get a moment soon I will try it again to remember exactly what the issue was. If that flow works for you typically, then stick with it, it is just very temperamental for me, alas.

With the steps I showed in my previous reply, I was able to successfully build v3.7.0 and main.

One note I'll mention, please review the building with msvc guide and ensure things like LongPath support is enabled. If any build errors occur that mention things like "file not found", it is likely a path length issue and you need to shorten the build folder path. I typically use c:\repos\cutlass for source and c:\repos\cutlass-build for the build. If none of that works, I've also had luck in the path using the 8.3 short path directory names for the build folder.

d-k-b avatar Feb 24 '25 20:02 d-k-b

did not produce a Visual Studio solution file that was configured correctly

What was incorrect about it?

If that flow works for you typically, then stick with it, it is just very temperamental for me, alas.

Let me be clear, I don't typically need to build CUTLASS on its own -- I am using it as part of xformers and onnxruntime which I sometimes need to build. Needless tosay, because of the unspecified issue with the code and later with MSVC (or CUDA?) compiler I am unable to build neither of those two.

Also, I don't know if that flow works or not — I have assumed that like any other CMake project I can configure it using CMake-gui and that provided solution would work. When it didn't I reported it here.

As for the build pre-requisites:

Windows 10 or 11

Windows 11 Version 10.0.22631.4890 (that'd be 23H2 fully patched up)

Visual Studio 2019 version 16.11.27, or Visual Studio 2022

Visual Studio 2022 17.13.1 at the moment, 17.11.0 when this issue was submitted.

CUDA Toolkit (at least 12.2; earlier 12.x versions may work)

CUDA 12.4 and 12.8 installed side by side

12.4 build working fine, 12.6 and 12.8 not working as reported in the issue

CMake (at least 3.18)

3.31.5 at the moment (I always have the latest)

git

2.47.0.2

Python (at least 3.6)

3.11.9

Long paths

Enabled ever since I installed Windows: Image

Create the build subdirectory in the CUTLASS clone directory

I did not do this step -- I was building out of tree assuming that it's supported.

Finally, I was still seeing the same error with SharedStorage on 3.7.0 with my build environment recently. I can try again but the build takes a lot of time before it fails on that even on my 12 core Xeon with 64 GB RAM and M.2 storage.

Hopefully that clarifies it a bit, let me know if there's anything else I can do.

levicki avatar Feb 24 '25 23:02 levicki

@levicki, all of those packages seem fine and are pretty much what I have installed.

Create the build subdirectory in the CUTLASS clone directory

Hmm... I think the Setting up the Build Environment portion might have been an early attempt at some documentation for Linux on Windows support. The build subdirectory does not need to be inside the cutlass repository. But I would ensure the build directory has a pretty short initial path like c:\builds\cutlass or something like that. It is likely that the default home directory location is going to have path length issues without using 8.3 filenames or some other shortening mechanism like a symbolic link to a shorter path.

d-k-b avatar Feb 25 '25 19:02 d-k-b

@d-k-b I have just now managed to build 3.8.0 with CUDA 12.8 and Visual Studio 2022 17.13.2. Seems that whatever the problem was has been fixed. I have also tried building xformers which updated their 3rd party CUTLASS dependency to v3.8.0 and it now buids without errors.

levicki avatar Feb 28 '25 13:02 levicki