Add CMake build system
Adds a CMake build system that may be used to build the feltor executable and all dg benchmarks and tests. The project can be built with first a 'configure' step:
cmake -Bbuild . # Set up to build in the ./build directory
and all targets (libraries, executables, tests, benchmarks) can then be build with a 'build' step:
cmake --build build
To avoid building everything, individual targets can be built with:
cmake --build build --target feltor
which would build only the feltor executable.
- Configurable with the options
-DFELTOR_USE_OMP=ON,-DFELTOR_USE_GPU=ONand-DFELTOR_USE_MPI=ONat the configure step. - Includes utilities to find the dependencies
thrust,cusp,vcl, anddrawon the user's system. They can also be downloaded and linked automatically by supplying options such as-DFELTOR_FETCH_THRUST=ON.DFELTOR_FETCH_DEPS=ONcan be used to download/link all dependencies. - Additionally finds libraries such as NetCDF, JSONC++, and GLFW.
- Adds a CI job to test everything compiles correctly.
Known Issues
- I can only get everything to reliably compile in OpenMP mode. For errors, see this failing CI job.
- When running with plain C++, everything that links to CUSP fails to compile due to a template specialisation error, as the specialisations
cusp::blas::blas_policy<ValueType, cusp::device_memory>andcusp::blas::blas_policy<ValueType, cusp::host_memory>are identical when settingTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP. I get the same issue when compiling with the provided makefiles. - When compiling for GPU, there seems to be a version mismatched between Thrust and CUSP, as CUSP tries to include the file
thrust/system/cuda/detail/detail/launch_calculator.hwhich doesn't exist. I've pinned the Thrust version to 1.9.3 as recommended in the README, but I can't figure out which versions of CUSP are compatible with this.
- When running with plain C++, everything that links to CUSP fails to compile due to a template specialisation error, as the specialisations
inc/dg/topology/filter_mpit.cudoesn't compile.- Tests are compiled in such a way that they can be run with
ctest --test-dir build, but some of the tests require user input, so this command will hang indefinitely.
Further work
- Could be easily expanded to include the other executables in the
srcdirectory. - The
dgtarget could be exported, which would allow external projects to link against it. - A rework of the tests to avoid user input and return simple 'pass' or 'fail' would allow tests to be run using
ctest. Testing could then be easily automated in a CI job.
Hi guys,
I had an initial look at this pull request and before I continue I want to say that I am still learning cmake and I have not yet a complete picture of what is going on, but I already have a few comments
- I can see that you wrote CMakeFile targets not only for the 3d feltor code but also for all the test and benchmark programs in the dg/inc folder. That is nice of you but I am currently working on and finishing up a branch that basically rewrites all of those tests in the Catch2 unit-test framework, which essentially deprecates all that work immediately.
- I can see that there are a couple of compile errors that you encounter. Since with the Makefile system I have no issues on my computer I am trying to see exactly what compile commands cmake is producing and compare those to what make does but cmake seems a bit convoluted and has some indirections which make it hard to see what it does. Is there a "just show me the plain compile command you are using" option?
- The cusp library is no longer maintained and is now preventing us to adopt newer and maybe even faster and more stable versions of thrust and its better Cmake integration. My immediate idea would be to get rid of the cusp dependence entirely. We use cusp for sparse matrix-vector and matrix-matrix multiplication. I can use the cusparse library for these operations on the GPU end. Would you have a recommendation for a fast and easy library for the CPU and OPenMP sparse matrix operations?
Hi @mwiesenberger, sorry for the delay, I'll work on getting this branch caught up to the latest changes shortly. To address your points:
I can see that you wrote CMakeFile targets not only for the 3d feltor code but also for all the test and benchmark programs in the dg/inc folder. That is nice of you but I am currently working on and finishing up a branch that basically rewrites all of those tests in the Catch2 unit-test framework, which essentially deprecates all that work immediately.
That shouldn't be a major problem, as the work I've done already should be adaptable to the changes. I also consider the move to Catch2 to be a significant improvement!
I can see that there are a couple of compile errors that you encounter. Since with the Makefile system I have no issues on my computer I am trying to see exactly what compile commands cmake is producing and compare those to what make does but cmake seems a bit convoluted and has some indirections which make it hard to see what it does. Is there a "just show me the plain compile command you are using" option?
At the build stage, the --verbose flag prints the full compile commands to screen:
cmake -Bbuild . # Configure the project
cmake --build build --verbose # Build all targets and print each compilation command
The compile errors I found were all due to cusp, so hopefully this will no longer be a problem after I update the branch.
The cusp library is no longer maintained and is now preventing us to adopt newer and maybe even faster and more stable versions of thrust and its better Cmake integration. My immediate idea would be to get rid of the cusp dependence entirely. We use cusp for sparse matrix-vector and matrix-matrix multiplication. I can use the cusparse library for these operations on the GPU end. Would you have a recommendation for a fast and easy library for the CPU and OPenMP sparse matrix operations?
I'm afraid I'm not terribly well versed in the GPU library ecosystem. Did you manage to find something during your own upgrades?
The issue with the GPU library was solved by using -lcusparse, which is shipped with CUDA, on the CPU and OpenMP end we use a trivial self-made matrix-vector kernel until something better comes along
Hi @mwiesenberger , sorry if your notifications have been getting spammed, I've been repeatedly pushing changes while trying to get my CI job to pass (in hindsight, I should have made a new branch for that).
I've mostly finished updating this PR to the latest standards. It's successfully building and passing tests for all combinations of C++/OpenMP/MPI, but not yet for GPUs (see this GitHub Action). A couple of bug fixes were needed to get the MPI to compile. I'll see if I can get it working with GPUs over the next few days.
Hi @LiamPattinson, nice work. I had a look at the github Action you pointed out and for some reason the test runner for GPUs skips loading the nvcc compiler and uses /usr/bin/c++, instead of nvcc? Not sure what the fix is though
Hi @LiamPattinson, nice work. I had a look at the github Action you pointed out and for some reason the test runner for GPUs skips loading the nvcc compiler and uses /usr/bin/c++, instead of nvcc? Not sure what the fix is though
I think it may be as simple as telling CMake to treat cpp files as if they were cu files when FELTOR_USE_GPU=ON -- I was doing the reverse before your recent updates. I'll look into it when I'm fixing up the other GPU issues.
I've now moved the definition of the dg target into inc/dg/CMakeLists.txt, and set up a new CMakeLists.txt in inc/dg/file. I'm also now running the file tests, and one of them related to netcdfs is failing (action). I also found that inc/file/probes_parser_t.cpp doesn't compile when using nlohmann_json.
Hi @LiamPattinson
In the failed test output in the runner I cannot see which test exactly failed, only that it is in the "Easy attributes" test case. Normally, catch2 has more output when a test fails, i.e. it prints the test log and the values that failed, etc. Is it possible to configure the runner to show that?
Apart from that I highly suspect that it is to do with the GIT_HASH or COMPILE_TIME macro. Not sure what exactly is wrong without more info though
The bug in probes_parser_t.cpp can be fixed by replacing line 94 with
dg::file::WrappedJsonValue js_format = dg::file::string2Json( fromfile.format);
You can leave this for me to fix though
Do I understand this correctly and you succesfully created the inc/dg/file/CMakeLists.txt without the need to move the file folder?
I've managed to get the Cuda stuff building: https://github.com/PlasmaFAIR/feltor/actions/runs/14405876835/job/40402162991
It's failing the tests simply because the GitHub runners don't have GPUs. I'll disable CI tests for GPU jobs now.
I'm not sure what's going wrong in the CUDA + MPI case. It looks like it compiled quite a lot of tests before suddenly realising there wasn't a local Cuda device.
Just as a heads up, I'll be away on leave for the next week, so it might take me a while to get back to you about any follow up questions. Hopefully it won't take much more work to get this finished.
Edit: Updated run showing GPU build passing https://github.com/PlasmaFAIR/feltor/actions/runs/14406458714/job/40404090246
I had a practical question (sorry for spamming your mail with so many individual comments), is there or do you have a prefered way of storing individual cmake build configurations. For example if I go on any HPC cluster and optimise some build flags for their specific architecture, is there a good way to make this configuration available to other users?
I had a practical question (sorry for spamming your mail with so many individual comments), is there or do you have a prefered way of storing individual cmake build configurations. For example if I go on any HPC cluster and optimise some build flags for their specific architecture, is there a good way to make this configuration available to other users?
Yep, you can use presets, which is basically just a json file with the settings in. The official docs are not great for actually learning how to implement them, but there's some good guides elsewhere, for example this one.
I've finally gotten MPI + CUDA to compile! https://github.com/PlasmaFAIR/feltor/actions/runs/14712161450. Unfortunately I haven't been able to check if all the tests pass, as I ran into Nvidia driver issues on my local GPU cluster.
I've also removed the Navier-Stokes bits from src/CMakeLists.txt.
@mwiesenberger, are there any other tasks you would like me to finish for this PR? There are a couple of outstanding issues I'm aware of, but I don't know if they should be fixed here or in a later PR:
- Not yet supporting the
geometriesormatrixlibraries in full - Should rethink the way
find_package/FetchContentare being used together, perhaps going to CPM instead (I don't know how well it would handle VCL).
I've finally gotten MPI + CUDA to compile! https://github.com/PlasmaFAIR/feltor/actions/runs/14712161450. Unfortunately I haven't been able to check if all the tests pass, as I ran into Nvidia driver issues on my local GPU cluster.
Perfect!
I've also removed the Navier-Stokes bits from
src/CMakeLists.txt.
Thanks
@mwiesenberger, are there any other tasks you would like me to finish for this PR? There are a couple of outstanding issues I'm aware of, but I don't know if they should be fixed here or in a later PR:
* Not yet supporting the `geometries` or `matrix` libraries in full * Should rethink the way `find_package`/`FetchContent` are being used together, perhaps going to [CPM](https://github.com/cpm-cmake/CPM.cmake) instead (I don't know how well it would handle VCL).
Yeah, this PR is maybe a bit overgown now, so I will test a bit more and then merge. The other (smaller) issues can be fixed in separate PRs
Update: As per our e-mail correspondence I will wait for a small update on the geometries header before merging
Hi @mwiesenberger, apologies for the lack of updates here, I have a lot of other projects competing for my attention at the moment. I'll try to find an hour this afternoon to work on this.
I'm running into some issues building the geometries library (failing action):
- Build failure of
conformal_elliptic_b.cppwhen running without OpenMP/GPU - I'm not sure I'm setting up Boost correctly. I tried linking directly against
Boost::math, but I get the error thatBoost_math_DIRis missing. Linking against plainBoost::boostseems to work, even though the missingmathcomponent is reported in the logs. @ZedThree have you had experience getting Boost to play nicely with CMake?
As the tests aren't set up for Catch2 yet, I've put the infrastructure in place but haven't added them to the dg_tests target.
I also don't know when I'll next have time to look at this, as my schedule for the next two weeks is completely filled. If the build error can't be easily resolved, it might make more sense to revert the last two commits and merge the PR from that point, as otherwise this PR risks hanging for a further month or so.
Hi @LiamPattinson,
thanks a lot for the update. The only boost dependence is in the taylor.h file because it calls special mathematical functions. With their integration into the C++-17 standard we actually do not need boost any more, so the dependency can be removed (simply replacing boost::math with std in taylor.h).
I also don't know when I'll next have time to look at this, as my schedule for the next two weeks is completely filled. If the build error can't be easily resolved, it might make more sense to revert the last two commits and merge the PR from that point, as otherwise this PR risks hanging for a further month or so.
Don't worry about it I will take it from here you have helped a lot.