cutlass
cutlass copied to clipboard
[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11
Describe the bug I initially reported this issue to xformers since xformers build was failing for me without realizing error was in CUTLASS submodule. After some back and forth and more testing on my end I realized the issue seems to be with CUTLASS 3.5.0.
Steps/Code to reproduce bug
- Install Visual Studio 2022 17.11.0 with C++ Desktop Development workload
- Install CUDA toolkit 12.6
git clone https://github.com/NVIDIA/cutlasscd cutlassgit checkout v3.5.0cmake-gui- Select VS 2022
- Select x64
- Leave native compiler
- Click Configure
- Click Generate
- Click Open project
- Select Release
- Click Build
Expected behavior
Build should succeed, it is failing on this (please ignore C:/BUILD/xformers prefix -- the same compilation errors happen from within Visual Studio build):
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): warning C4346: 'SharedStorage': dependent name is not a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: prefix the qualified-id with 'typename' to indicate a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: the template instantiation context (the oldest one first) is
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(60): note: while compiling class template partial specialization 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(124): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(133): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage::PipelineStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): error C2061: syntax error: identifier 'SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C3646: 'math_wg_order': unknown override specifier
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
Note that there might be other build errors as well, this was just the first place where building a project failed. It seems as if there might be some compiler issue with latest Visual Studio update?
Environment details (please complete the following information):
- Environment location: Bare-metal
Additional context
cl.exe Version 19.41.34120 for x64
Your setup.py is using -std=c++17 for CXX options — MSVC syntax is -std:c++17 or /std:c++17, using GNU syntax leads to a warning about unrecognized compiler option (and probably compilation without C++17 support). Also, -O3 doesn't exist for MSVC.
The culprit is CUDA 12.6 — I can build with CUDA 12.4.1 just fine.
NVIDIA bug ID #4820029.
tracking. Does 3.5.1 also fail with the same issue?
tracking. Does 3.5.1 also fail with the same issue?
@thakkarV I don't see a tag for 3.5.1 and it's not in releases yet?
Main is 3.5.1. We will tag soon
Main is 3.5.1. We will tag soon
@thakkarV Hopefully not before this issue is root-caused and at least worked around?
It appears to be a CUDA toolkit issue. If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too
If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too
If you mean with CUDA 12.6, can you give repro steps for some minimal build that triggers it so I don't have to run the full build?
Even better if you can isolate just relevant code part which causes compiler errors so I can try to build just that from the developer command prompt.
EDIT: If I remember correctly I tried with main as well, didn't make any difference.
Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)
Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)
I asked on the ticket, no response yet from engineering team.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
Closed because NVIDIA is apparently too lazy to fix it, what with resting on their laurels, it's a full time job.
Just to update everyone that this seems to be CUDA compiler issue, and no — it's not fixed in CUDA 12.8.
Hi, Just wanted to follow-up on this. This is blocking the build of some components of xFormers on Windows - is there a way to do a workaround in the cutlass code maybe, until this is fixed on the compiler side? cc @hwu36 maybe?
@d-k-b
Hi, Just wanted to follow-up on this. This is blocking the build of some components of xFormers on Windows - is there a way to do a workaround in the cutlass code maybe, until this is fixed on the compiler side? cc @hwu36 maybe?
Not just xformers, onnxruntime was blocked too. Not sure if they found a workaround.
We have fixed a number of Windows / MSVC issues in the past couple months. CUDA 12.8 + CUTLASS 3.7+ should now compile properly. The issue originally seen in this bug was fixed in v3.7. However, given the "flakiness" of Windows environments, I was not able to verify the exact reproduction flow as shown above, the cmake-gui has never worked in my VM environment properly. So I'm posting my simplified flow here for anyone to try out.
Start with Win 10/11 (with latest updates) or Win Server 2019+
Install Git, Python3, and CMake.
Install Visual Studio 2019+, and all updates.
Install CUDA 12.8
Open Visual Studio Development PowerShell Window
> Clone CUTLASS, checkout v3.7.0 or higher revision
> cmake . -Bbuild -DCMAKE_CUDA_COMPILER="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\nvcc.exe" -G "Visual Studio 16 2019" -A x64 -T host=x64 -DCMAKE_SUPPRESS_REGENERATION=ON
> cmake --build build --config Release -j $((Get-CimInstance -ClassName Win32_Processor).NumberOfCores) -- /verbosity:minimal
Note, at the time of posting, CUDA 12.8 installer hangs while trying to install to the latest VS 2022 extensions. If that happens to you, you need to roll back to an older version of Visual Studio 2022 temporarily until that is sorted out.
Update: It appears the installation issue is with VS 2022 17.13 Previews 3+, so rolling back to the last stable VS install should work.
posting my simplified flow here for anyone to try out
Not to come across as argumentative, but:
- What can we do when we don't control the build process such as when compiling xformers or onnxruntime where CUTLASS is a dependency and is managed by their build process?
- What exactly is the "fix" here? 3.7.0? Something newer? Defines you're setitng? I tried 3.7.0 like 3 weeks ago and it didn't compile for me using CUDA 12.8 and VS 2022
Finally, you aren't using my issue repro steps -- if they are wrong, broken, or unsupported you should outright say so and explain why the compile is failing at SharedStorage bit.
Not to come across as argumentative, but:
Not at all, we are all just trying to get this to work 😄
What can we do when we don't control the build process such as when compiling xformers or onnxruntime where CUTLASS is a dependency and is managed by their build process?
There are a number of moving targets here. There was an actual code issue described in this bug, which was fixed in 3.5.1 iirc, but along the way there were some breakages due to code that was incompatible with various versions of MSVC. Which leads into your next question ...
What exactly is the "fix" here? 3.7.0? Something newer? Defines you're setting? I tried 3.7.0 like 3 weeks ago and it didn't compile for me using CUDA 12.8 and VS 2022
If the build processes are compiling CUDA code successfully and applying the correct CUTLASS flags, then v3.7 should have the necessary fixes for the bug mentioned in the description and for the MSVC syntax fixes to be able to build CUTLASS successfully.
However, I was not able to reproduce a successful build environment using the exact steps shown in the description. The cmake-gui flow did not produce a Visual Studio solution file that was configured correctly. If I get a moment soon I will try it again to remember exactly what the issue was. If that flow works for you typically, then stick with it, it is just very temperamental for me, alas.
With the steps I showed in my previous reply, I was able to successfully build v3.7.0 and main.
One note I'll mention, please review the building with msvc guide and ensure things like LongPath support is enabled. If any build errors occur that mention things like "file not found", it is likely a path length issue and you need to shorten the build folder path. I typically use c:\repos\cutlass for source and c:\repos\cutlass-build for the build. If none of that works, I've also had luck in the path using the 8.3 short path directory names for the build folder.
did not produce a Visual Studio solution file that was configured correctly
What was incorrect about it?
If that flow works for you typically, then stick with it, it is just very temperamental for me, alas.
Let me be clear, I don't typically need to build CUTLASS on its own -- I am using it as part of xformers and onnxruntime which I sometimes need to build. Needless tosay, because of the unspecified issue with the code and later with MSVC (or CUDA?) compiler I am unable to build neither of those two.
Also, I don't know if that flow works or not — I have assumed that like any other CMake project I can configure it using CMake-gui and that provided solution would work. When it didn't I reported it here.
As for the build pre-requisites:
Windows 10 or 11
Windows 11 Version 10.0.22631.4890 (that'd be 23H2 fully patched up)
Visual Studio 2019 version 16.11.27, or Visual Studio 2022
Visual Studio 2022 17.13.1 at the moment, 17.11.0 when this issue was submitted.
CUDA Toolkit (at least 12.2; earlier 12.x versions may work)
CUDA 12.4 and 12.8 installed side by side
12.4 build working fine, 12.6 and 12.8 not working as reported in the issue
CMake (at least 3.18)
3.31.5 at the moment (I always have the latest)
git
2.47.0.2
Python (at least 3.6)
3.11.9
Long paths
Enabled ever since I installed Windows:
Create the build subdirectory in the CUTLASS clone directory
I did not do this step -- I was building out of tree assuming that it's supported.
Finally, I was still seeing the same error with SharedStorage on 3.7.0 with my build environment recently. I can try again but the build takes a lot of time before it fails on that even on my 12 core Xeon with 64 GB RAM and M.2 storage.
Hopefully that clarifies it a bit, let me know if there's anything else I can do.
@levicki, all of those packages seem fine and are pretty much what I have installed.
Create the build subdirectory in the CUTLASS clone directory
Hmm... I think the Setting up the Build Environment portion might have been an early attempt at some documentation for Linux on Windows support. The build subdirectory does not need to be inside the cutlass repository. But I would ensure the build directory has a pretty short initial path like c:\builds\cutlass or something like that. It is likely that the default home directory location is going to have path length issues without using 8.3 filenames or some other shortening mechanism like a symbolic link to a shorter path.
@d-k-b I have just now managed to build 3.8.0 with CUDA 12.8 and Visual Studio 2022 17.13.2. Seems that whatever the problem was has been fixed. I have also tried building xformers which updated their 3rd party CUTLASS dependency to v3.8.0 and it now buids without errors.