zfp icon indicating copy to clipboard operation
zfp copied to clipboard

Unable to compile with ZFP_WITH_CUDA=TRUE on Windows

Open kminemur opened this issue 1 year ago • 21 comments

Hi team,

I'm trying to compiler zfp with CUDA on Windows system, however I'm getting link error of "LINK : fatal error LNK1104: cannot open file 'stdc++.lib'". I could compiler it on Ubuntu22.04 environemt.

Does zfp with CUDA only support Linux environment?

Steps install CUDA Toolkit 12.5 git clone zfp cd zfp; mkdir zfp; cd zfp cmake .. -DZFP_WITH_CUDA=TRUE cmake --build .

kminemur avatar Nov 14 '24 05:11 kminemur

There's no reason I'm aware of why zfp cannot be built with CUDA support on Windows. We unfortunately do not have a CUDA-capable Windows machine that would allow us to reproduce the issue, but it seems unlikely that the issue is related to zfp.

Perhaps you can include the full CMake output to see if it gives any other hints? Also, what version of CMake are you using? What host compiler and version? Is the compiler compatible with CUDA 12.5? See https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html for requirements. Can you successfully build any of the CUDA code samples?

lindstro avatar Nov 14 '24 16:11 lindstro

cmake out log is as follows:

cmake --version cmake version 3.29.5-msvc4

cmake .. -DZFP_WITH_CUDA=TRUE --log-level=VERBOSE -- Building for: Visual Studio 17 2022 -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.22631. -- The C compiler identification is MSVC 19.42.34433.0 -- The CXX compiler identification is MSVC 19.42.34433.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.42.34433/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.42.34433/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Compiling with C standard: 90 -- Compiling with C++ standard: 98 -- Found OpenMP_C: -openmp (found version "2.0") -- Found OpenMP: TRUE (found version "2.0") found components: C CMake Warning (dev) at CMakeLists.txt:205 (find_package): Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake --help-policy CMP0146" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

This warning is for project developers. Use -Wno-dev to suppress it.

-- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.5 (found version "12.5") -- Performing Test HAVE_MATH -- Performing Test HAVE_MATH - Success -- Configuring done (18.2s) -- Generating done (0.1s) -- Build files have been written to: C:/Users/MTL/kazuki/zfp/build

kminemur avatar Nov 15 '24 08:11 kminemur

So the CMake warning is related to #232, which we will fix in the next release. Still, CUDA is found, so presumably that's not the cause.

From NVIDIA docs, your compiler should be compatible with CUDA 12.5. But please do attempt to build one or more of the CUDA code samples to verify that this is not a zfp specific issue. It would also be instructive to test if zfp builds with CUDA disabled (-DZFP_WITH_CUDA=OFF).

lindstro avatar Nov 15 '24 16:11 lindstro

Hi, sorry for late response.

CUDA code samples can be compiled without any issue on my env. zfp builds with CUDA disabled also can be build on my env.

thank you for sharing https://github.com/LLNL/zfp/pull/232, I will keep an eye on it.

Thanks.

kminemur avatar Nov 25 '24 07:11 kminemur

I just noticed this CMake line: https://github.com/LLNL/zfp/blob/a46fa8b91bf2d69f4ffcf04af4f908383828ba79/src/CMakeLists.txt#L44-L46 What happens if you comment this out?

lindstro avatar Nov 25 '24 16:11 lindstro

Not working, got linking error.

e.g. error LNK2019: unresolved external symbol cudaGetLastError referenced in function "bool __cdecl cuZFP::is_gpu_ptr(void const *)" (?is_gpu_ptr@cuZFP@@YA_NPEBX@Z)

Btw, I think you can check it on your windows system by just installing CUDA Toolkits.

kminemur avatar Nov 26 '24 01:11 kminemur

cudaGetLastError is quite clearly part of the CUDA API, so there's something more fundamental going on here. It may be useful to make VERBOSE=1 (not sure what the Windows equivalent is) to capture the compiler flags being used with zfp and the CUDA examples to see how they differ.

If that reveals nothing, perhaps we just need to wait for #232 to be merged and see if that resolves the issue. However, we're not at a very good point to merge that PR (see discussion there).

Btw, I think you can check it on your windows system by just installing CUDA Toolkits.

We don't have access to a Windows machine with an NVIDIA GPU, so that is unfortunately not an option.

lindstro avatar Nov 26 '24 02:11 lindstro

I mean just compiling without NVIDA GPU. Issue here is code generation. I will wait the fix then.

kminemur avatar Nov 26 '24 02:11 kminemur

@kminemur Can you please check to see if the latest changes on the staging branch address your issue?

lindstro avatar Jan 01 '25 18:01 lindstro

Hi, the staging branch raises the other error.

nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified

C:\Program Files\Microsoft Visual Studio\2022\Professional\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.5.targets(799,9): error MSB3721: The command ""C:\Progra m Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.42.344 33\bin\HostX64\x64" -x cu -IC:\Users\MTL\kazuki\zfp\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir zfp\x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] /wd4146 /wd4305 -Xcompiler="/EHsc -Zi -Ob0" -g -D_WINDOWS -DZFP_WITH_OPENMP -DZFP_WITH_CUDA -DZFP_ROUNDING_MODE=ZFP_ROUND_NEVER -DZFP_SOURCE -DZFP_SHARED_LIBS -D_CRT_SECURE_NO_WARNINGS -D_SCL_SECURE_NO_WARNINGS -D"CMAKE_INTDIR="Debug"" -Dzfp_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DZFP_WITH_OPENMP -DZFP_WITH_CUDA -DZFP_ROUNDING_MODE=ZFP_ROUND_NEVER -DZFP_SOURCE -DZFP_SHARED_LIBS -D_CRT_SECURE_NO_WARNINGS -D_SCL_SECURE_NO_WARNINGS -D"CMAKE_INTDIR="Debug"" -Dzfp_EXPORTS -Xcompiler "/EHsc /W1 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/Fdzfp.dir\Debug\vc143.pdb" -o zfp.dir\Debug\interface.obj "C:\Users\MTL\kazuki\zfp\src\cuda\interface.cu"" exited with code 1. [C:\Users\MTL\kazuki\zfp\build\src\zfp.vcxproj]

kminemur avatar Jan 06 '25 05:01 kminemur

I wonder if the MSVC flags /wd4146 and /wd4305 are tripping nvcc up and making it think those are source files. What if you comment those out here: https://github.com/LLNL/zfp/blob/3d2d67a37545c40bc693f7c0e0ba614b54272396/CMakeLists.txt#L80-L81

lindstro avatar Jan 08 '25 16:01 lindstro

Update: I can confirm that commenting out those two CMake lines fixes the above nvcc issue (on the staging branch). Ideally those warning suppressions would be passed only to the host compiler. We'll look into a permanent fix.

The next issue has to do with stdc++.lib not being found. At least on Windows, simply removing stdc++ from this line addresses that issue: https://github.com/LLNL/zfp/blob/3d2d67a37545c40bc693f7c0e0ba614b54272396/src/CMakeLists.txt#L37

There is at least one more issue with duplicate bitstream symbols that we need to look into. I suspect that these are not being compiled as static functions by nvcc.

lindstro avatar Jan 09 '25 18:01 lindstro

This last issue is fixed by substituting #define inline_ inline with #define inline_ static inline here: https://github.com/LLNL/zfp/blob/3d2d67a37545c40bc693f7c0e0ba614b54272396/src/cuda/shared.cuh#L18 These changes allow me to build zfp on Windows with -DZFP_WITH_CUDA=ON, though without an NVIDIA GPU on a Windows machine, I cannot test for correctness.

We'll push some changes that address these build issues over the coming days.

lindstro avatar Jan 09 '25 18:01 lindstro

@kminemur I believe we've now addressed the CUDA build issues on Windows on the staging branch. Can you please test it and see if it works for you?

lindstro avatar Jan 14 '25 02:01 lindstro

Hi @lindstro

The commit 6e825731efd88d90f28c06ab4aa25a270569385 solves this compiling issue. we can close this issue. Thanks.

kminemur avatar Jan 14 '25 02:01 kminemur

Great to hear. Since we don't have a Windows box with an NVIDIA GPU, would you mind running the tests also (by building with -DBUILD_TESTING_FULL=ON and then running ctest) to make sure all is well before I close this issue?

lindstro avatar Jan 14 '25 02:01 lindstro

Hi,

Unfortunately, all ctest is not passing.

cmake .. -DZFP_WITH_CUDA=TRUE -DBUILD_TESTING_FULL=ON cmake --build . --config Release > build.log

build.log ctest.log

update:

without CUDA option also gets all failed (not run).

kminemur avatar Jan 14 '25 03:01 kminemur

@kminemur Thanks for trying. I believe there are a few issues at hand.

First, I saw similar google test linkage warnings on my machine, which from a cursory look may have something to do with including gtest.h more than once in the same translation unit (which you'd think should have no impact).

Second, CMocka seems to cause some symbolic link issues on Windows, which evidently thinks symlinks are harmful:

CUSTOMBUILD : CMake error : failed to create symbolic link 'C:/Users/MTL/kazuki/zfp/build/cmocka-src/compile_commands.json': A required privilege is not held by the client. [C:\Users\MTL\kazuki\zfp\build\tests\cmocka_cloned-build.vcxproj]

The "correct" solution cannot be to require admin privileges just to use CMocka--let me look further into this.

Your build log suggests that most (all?) test executables are built, yet no tests are being run. Is this another Windows issue, where ctest must be instructed which configuration to use (Release vs. Debug) via --config? It makes no sense to me that no tests are being run here.

lindstro avatar Jan 14 '25 04:01 lindstro

@kminemur Can you please confirm whether or not you ran ctest -C Release and not just ctest? On Windows, you have to specify which configuration to test. I see no other reason for why no tests were run.

lindstro avatar Jan 15 '25 17:01 lindstro

@lindstro

42% tests passed, 129 tests failed out of 221 with "ctest -C Release"

ctest_release.log

kminemur avatar Jan 16 '25 00:01 kminemur

@kminemur Thanks for letting us know. The "failing" tests are the C tests that depend on CMocka and which are not being built, as that requires permission to make symlinks on Windows (see, e.g., https://stackoverflow.com/questions/61243174/replacement-of-create-symlink-in-windows). These tests pass on the develop branch, which uses CMocka 1.1.5, while the staging branch uses the most recent version, CMocka 1.1.7. As I can reproduce this issue on my Windows machine, I will look into what needs to be done to get these tests built on Windows.

lindstro avatar Jan 16 '25 01:01 lindstro