jitify icon indicating copy to clipboard operation
jitify copied to clipboard

Jitify2 Windows/Visual Studio 2022 fixes

Open ptheywood opened this issue 4 months ago • 5 comments

Compiling the current state of jitify2 (70783a3ad7b0cad2992a26a1ebf8fbe3d6b44e25) under windows / MSVC 2022 was resulting in a number of compilation errors and compilation warnings when included in a separate project.

  • Suppreses MSVC C4996 warnigns about the use of getenv and fopen via a compiler definition
  • Fix C4129 unrecognised character escape sequence warnings on windows
  • Fix MSVC error C2110: '+': cannot add two pointers
  • Fix MSVC error C2026: string too big
    • Fixed by splitting the string literal into two adjacent string literals. I chose an arbitrary point in the middle which seemed relatively sensible.
  • Use std::invoke_result_t for >= C++17 in place of std::result_of
    • std::result_of was deprecated in C++17 and removed in C++20
  • Suppress conversion from size_t to int warning with an explicit cast
  • Suppress deprecated method warnings for jtiify2::CompilerProgramData::nvvm in the test suite
  • Suppress NVCC warning #550-D set but never used variable pch_verbose on windows
    • This should not be neccessary, but (void)pch_verbose; does not prevent this being raised on windows

Additionally, a few fixes are included for the CMakeLists.txt project in this repository for use under Windows / with Visual Studio 2022 as the generator.

  • Fix CMake error on windows using TARGET_FILE to launch dynamically created binaries
  • Ensure the check target behaves with multi-config generators by passing -C / --build-config with the target genex.

I no longer / don't currently have the ability to check with VS2019 as github actions no longer provides an image with it, and I do not have a local install at the moment. We may also be dropping support as we cannot trivally use it via CI anymore.


There are still 2 outstanding issues I have not yet resolved, and some input would be helpful:

  1. The trivially copyable kernel launch argument static assertion is failing for TEST(Jitify2Test, ClassKernelArg) in jitify2_test.cu, where Arg is used. I'm unsure on the correct fix for this as the static assertion looks sound to me. The test could be built with JITIFY_IGNORE_NOT_TRIVIALLY_COPYABLE_ARGS=1 to avoid this, but that feels wrong.

    C:\Users\ptheywood\code\ptheywood\jitify\jitify2.hpp(2493): error C2338: static_assert failed: 'Kernel launch arguments must be trivially copyable' [C:\Users\ptheywood\code\ptheywood\jit
    ify\build\jitify2_test.vcxproj]
        C:\Users\ptheywood\code\ptheywood\jitify\jitify2.hpp(2493): note: the template instantiation context (the oldest one first) is
        C:\Users\ptheywood\code\ptheywood\jitify\jitify2_test.cu(1579): note: see reference to function template instantiation 'jitify2::ErrorMsg jitify2::ConfiguredKernelData::launch<int*,Arg
        >(const Arg &,const Arg &) const' being compiled
                with
                [
                    Arg=int *
                ]
    
  2. With the above test disabled, a number of tests are failing in the jitify2 test suite. I've not yet attempted to resolve these, but will when I have time (ctest log).

    1: [  FAILED  ] 6 tests, listed below:
    1: [  FAILED  ] Jitify2Test.GetDirectoryHelpers
    1: [  FAILED  ] Jitify2Test.ReadmeExampleCode
    1: [  FAILED  ] Jitify2Test.UserGuideExampleCode
    1: [  FAILED  ] Jitify2Test.NvccSimple
    1: [  FAILED  ] Jitify2Test.EncodedQuoteIncludes
    1: [  FAILED  ] Jitify2Test.LinkCurrentExecutable
    
    2: [  FAILED  ] 6 tests, listed below:
    2: [  FAILED  ] Jitify2Test.GetDirectoryHelpers
    2: [  FAILED  ] Jitify2Test.ReadmeExampleCode
    2: [  FAILED  ] Jitify2Test.UserGuideExampleCode
    2: [  FAILED  ] Jitify2Test.NvccSimple
    2: [  FAILED  ] Jitify2Test.EncodedQuoteIncludes
    2: [  FAILED  ] Jitify2Test.LinkCurrentExecutable
    2:
    

I have also only tested this on Windows, so need to ensure I have not caused issues under linux either.

Todo

  • Resolve trivially copyable error on MSVC
  • Resolve test failures
  • Test under linux once all windows fixes/suppressions are in-place.
    • Warnign free and ctest passes as of fd04575 with GCC 12.3.0 and CUDA 12.9.86

ptheywood avatar Aug 05 '25 19:08 ptheywood

I no longer / don't currently have the ability to check with VS2019 as github actions no longer provides an image with it, and I do not have a local install at the moment.

I have it at home, not sure about in the office. Could try it later this week/next

Robadob avatar Aug 05 '25 20:08 Robadob

Thanks so much for this!

I'll need to find time to take a closer look, but I quickly ran the tests on Linux and the only problem is the --build-config $<CONFIG> bit, which I had to change to --build-config "\"$<CONFIG>\"" to make it work (otherwise the value is empty and it thinks it's missing). (It may still need work; CMake is not my strong suit).

Regarding the trivially-copyable thing, it seems that trivial copyability of classes with deleted copy constructors is complicated, which may explain why the compilers have different opinions. I don't really even remember why the Arg class is made non-copyable, so you can probably just remove that.

benbarsdell avatar Aug 06 '25 12:08 benbarsdell

I've now resolved all but one of the test failures under windows (Visual Studio 2022 with CUDA 12.6), which originally were:

[  FAILED  ] Jitify2Test.GetDirectoryHelpers
[  FAILED  ] Jitify2Test.ReadmeExampleCode
[  FAILED  ] Jitify2Test.UserGuideExampleCode
[  FAILED  ] Jitify2Test.NvccSimple
[  FAILED  ] Jitify2Test.EncodedQuoteIncludes
[  FAILED  ] Jitify2Test.LinkCurrentExecutable
[  FAILED  ] Jitify2Test.SerializationGoldensProgram
[  FAILED  ] Jitify2Test.SerializationGoldensPreprocessedProgram
[  FAILED  ] Jitify2Test.SerializationGoldensCompiledProgram
[  FAILED  ] Jitify2Test.SerializationGoldensLinkedProgram

with equivalent failures for Jitify2TestStatic.

Most failures were related to the location of the cuda home / include directory and/or nvcc, and the presence of the host compiler on the path and could be mitigated with CUDA_HOME/JITIFY_CUDA_HOME and JITIFY_NVCC are used. Additionally cl.exe must be added to users paths for JITIFY_ENABLE_NVCC.

These were caused by / addressed by:

  • On Windows, there is no symlinked CUDA install as a fallback / default location as on Windows, so the default/fallback locaiton in guess_cuda_home was not valid.
    • Using the version from CUDA_VERSION defined in cuda.h seemed like a reasonable option, although this may differ from the version nvcc found at runtime which could cause issues (if env variables are not set), although this applies to the linux fallback option as well.
  • On Windows, CUDA_PATH is set by the cuda installer to point to the most recent CUDA installation. This could be checked if JITIFY_CUDA_HOME an CUDA_HOME are not set / valid.
  • On Windows, the default CUDA installation directory contains spaces, so the path to the binary must be quoted when invoked via _popen.
    • The same applies to the temporary include directory, which also includes single backslashes that need escaping or quoting too.
  • GetTempPath2A on windows was returning the shortened version of the path to the temporary folder, which (potenitally) modifies the username portion of the path to include a ~, i.e. C:\Users\PTHEYW~1\AppData\Local\Temp\ where my local username is actually ptheywood.
    • This fix may not be requried, as the single backslashes in the unquoted string were probably the real problem
  • A warning is emmitted on windows about the temporary directory not being deleted - as the detail::remove_all is not implemented for windows in C++14, and the __cplusplus value in MSVC tends to lie.
    • Switching to use JITIFY_CPLUSPLUS instead allows std::filesystem::remove_all to be used instead, and we do not require a c++14 version so I've not put the time in to address that case

Jitify2Test.NvccSimple continued to fail, as is_valid_nvcc was always returning false.

  • _pclose on windows (and therefore run_system_command), returns 0 when nvcc --version succeeds or 1 when nvcc could not be invoked. is_valid_nvcc would only return true when a positive non-zero integer was returned, which was not the case on success on windows.
    • The documentation for the return value from _pclose is poor, stating to look at the _cwait docs for info on the value returned, but _cwait's documentation does not contain this...
  • pclose on posix/linux returns a positive integer value on success, which may include more than just the return code which could be checked via WIFEXITED & WEXITSTATUS.
    • Use of run_system_command in is_valid_nvcc was always triggering a sigpipe under unix as nothing was being read from the pipe, leading to the pclose return value being a ~5 digit positive integer, and the underlying return code was 141 (128 + 13 for sigpipe).

I've resolved this by making run_system_command return:

  • -1 if somethingn went very wrong opening / closing the pipe (as before)
  • the exit code of the underlying command, 0 on success by convention, non zero otherwise

And where run_system_command is used the result is compared against 0 for truthy-ness. This is covered by the test suite for is_nvcc_valid, but does not appear to be covered for use of detail::Nvcc::operator() (in NvccProgram::compile)? So I'm not 100% I've not caused any problems there / know if that behaves or not under windows.

Does this seem like an OK solution?

Jitify2Test.EncodedQuoteIncludes was failing as the index of the cuda_fp16.h header was 3 not 2 as in the test (i.e. "__jitify_I3@" instead of "__jitify_I2@"). I've resovled this by expecting a different value on windows than on linux.

Finally, Jitify2Test.LinkCurrentExecutable is continuing to fail under windows:

unknown file: error: C++ exception with description "Jitify fatal error: Linking failed: NVJITLINK_ERROR_INVALID_INPUT
Linker options: "-l. -arch=sm_86"
Linker error: Failed to add file: C:\Users\ptheywood\code\ptheywood\jitify\build-cu126\Release\jitify2_test.exe" thrown in the test body.

I'm unsure on why this doesn't work under windows / am not familiar enough with nvjitlink to know if this should work under windows or not, or how to fix it. Any suggestions?

ptheywood avatar Aug 22 '25 16:08 ptheywood

With CUDA 13 (after rebasing on jitify2), additional errors occur under windows with VS2022:

When attempting to build the test suite(s), an error occurs in the MSVC customisations

C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 13.0.targets(803,9): error MSB6001: Invalid command line switch for "". The value [CUDA_INC_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include] contains an odd number of double-quote characters. Only even numbers of literal double-quote characters are acceptable to command line tools. [C:\Users\ptheywood\code\ptheywood\jitify\build-cu130\jitify2_test_static.vcxproj]

From the .vcxproj files(s), CUDA_INC_DIR is a quoted string of 2 include directories (as cccl has moved), separated by a ;. I would have expected this should be fine, but ; is the separator used within .vcxproj <Defines>. This feels like it may be a bug in the cuda build customisations?.

<Defines>%(Defines);_WINDOWS;_CRT_SECURE_NO_WARNINGS;NDEBUG;JITIFY_ENABLE_NVTX=1;JITIFY_LINK_CUDA_STATIC=1;JITIFY_LINK_NVRTC_STATIC=1;JITIFY_LINK_NVJITLINK_STATIC=1;CUDA_INC_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include/cccl";CUB_DIR=;CMAKE_INTDIR="RelWithDebInfo"</Defines>
add_compile_definitions(
    CUDA_INC_DIR="${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}"
    CUB_DIR=${CUDA_INC_DIR})
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include/cccl

This can be worked around by passing them separately (splitting in cmake and passing as separate variables), but libcudacxx / cuda/std is used within the cuda toolkit (i.e. in curand) so where these are not explicitly passed to tests failures also occur.

Passing the CUB_INC_DIR as an additional include directory to tests which did not previsouly provide it partially works, however failures still occur due to:

  • Jitify2Test.EncodedQuoteIncludes fails as the inclusion index has changed

  • Several tests fail due to CUDA 13's removal of pre-turing compute capabilities

    • I've bumped these to Turing (75) which was introduced with CUDA 10.0 which seems recent enough to be OK as a minimum the test suite will run on (4 major versions), with the exception of passing multiple arch flags which would require >= CUDA 11.0 (for sm_80 to include more than one arch).
  • Thrust requires --std=c++17 for CCCL 3.0

  • Jitify2Test.LibCudaCxxAndBuiltinLimits fails due to incompatible redefinition of macro warnings for INTPTR_MIN etc between cuda/std/cstdint and stdint.h.

    -cuda/std/cstdint(105): warning #47-D: incompatible redefinition of macro \"INTPTR_MIN\" (declared at line 16 of stdint.h)
    -  #  define INTPTR_MIN  INT64_MIN
    -            ^
    -
    -Remark: The warnings can be suppressed with \"-diag-suppress <warning-number>\"
    -
    -cuda/std/cstdint(106): warning #47-D: incompatible redefinition of macro \"INTPTR_MAX\" (declared at line 18 of stdint.h)
    -  #  define INTPTR_MAX  INT64_MAX
    -            ^
    -
    -cuda/std/cstdint(107): warning #47-D: incompatible redefinition of macro \"UINTPTR_MAX\" (declared at line 20 of stdint.h)
    -  #  define UINTPTR_MAX UINT64_MAX
    -            ^
    -
    -cuda/std/cstdint(109): warning #47-D: incompatible redefinition of macro \"INTMAX_MIN\" (declared at line 17 of stdint.h)
    -  #  define INTMAX_MIN  INT64_MIN
    -            ^
    -
    -cuda/std/cstdint(110): warning #47-D: incompatible redefinition of macro \"INTMAX_MAX\" (declared at line 19 of stdint.h)
    -  #  define INTMAX_MAX  INT64_MAX
    -            ^
    -
    -cuda/std/cstdint(111): warning #47-D: incompatible redefinition of macro \"UINTMAX_MAX\" (declared at line 21 of stdint.h)
    -  #  define UINTMAX_MAX UINT64_MAX
    -            ^
    -
    -cuda/std/cstdint(113): warning #47-D: incompatible redefinition of macro \"PTRDIFF_MIN\" (declared at line 22 of stdint.h)
    -  #  define PTRDIFF_MIN INT64_MIN
    -            ^
    -
    -cuda/std/cstdint(114): warning #47-D: incompatible redefinition of macro \"PTRDIFF_MAX\" (declared at line 23 of stdint.h)
    -  #  define PTRDIFF_MAX INT64_MAX
    

    In CCCL 3, these are defined in libcudacxx/include/cuda/std/cstdint#L105-L114, as

    #  define INTPTR_MIN  INT64_MIN
    #  define INTPTR_MAX  INT64_MAX
    #  define UINTPTR_MAX UINT64_MAX
    
    #  define INTMAX_MIN  INT64_MIN
    #  define INTMAX_MAX  INT64_MAX
    #  define UINTMAX_MAX UINT64_MAX
    
    #  define PTRDIFF_MIN INT64_MIN
    #  define PTRDIFF_MAX INT64_MAX
    

    Which is different than in jitsafe_header_stdint_h:

    #define INTPTR_MIN LONG_MIN
    #define INTMAX_MIN LLONG_MIN
    #define INTPTR_MAX LONG_MAX
    #define INTMAX_MAX LLONG_MAX
    #define UINTPTR_MAX ULONG_MAX
    #define UINTMAX_MAX ULLONG_MAX
    #define PTRDIFF_MIN INTPTR_MIN
    #define PTRDIFF_MAX INTPTR_MAX
    

    I'm unsure on the correct resolution for this.


This PR under Windows now has 2 errors under windows, which I'm not sure how to (best) resolve:

  • Jitify2Test.LinkCurrentExecutable fails as mentioned in the previous comment. I do not know how to resolve this.
  • Jitify2Test.LibCudaCxxAndBuiltinLimits fails with CUDA 13.0, due to the `incompatible redefinition warnings. I'm not sure on how to best fix this.

~I also need to re-test my MSVC fixes under linux once again prior to this being ready for full review/merging.~ Edit: Slight fix required for difference in CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES values for CUDA 13 between unix/windows, but otherwise CUDA 12.9 passes all tests, CUDA 13.0 fails Jitify2Test.LibCudaCxxAndBuiltinLimits which was expected.

Any suggestions on the 2 outstanding test failures @benbarsdell?

ptheywood avatar Sep 03 '25 17:09 ptheywood

@benbarsdell Any chance that you could give @ptheywood some insight into the two failing tests? We are wanting to get a release of FLAME GPU out very soon but are stuck needing this PR to be merged. Could the tests even be skipped on windows to get these changes merged in?

mondus avatar Sep 23 '25 10:09 mondus