feltor Add CMake build system

Adds a CMake build system that may be used to build the feltor executable and all dg benchmarks and tests. The project can be built with first a 'configure' step:

cmake -Bbuild .  # Set up to build in the ./build directory

and all targets (libraries, executables, tests, benchmarks) can then be build with a 'build' step:

cmake --build build

To avoid building everything, individual targets can be built with:

cmake --build build --target feltor

which would build only the feltor executable.

Configurable with the options -DFELTOR_USE_OMP=ON, -DFELTOR_USE_GPU=ON and -DFELTOR_USE_MPI=ON at the configure step.
Includes utilities to find the dependencies thrust, cusp, vcl, and draw on the user's system. They can also be downloaded and linked automatically by supplying options such as -DFELTOR_FETCH_THRUST=ON. DFELTOR_FETCH_DEPS=ON can be used to download/link all dependencies.
Additionally finds libraries such as NetCDF, JSONC++, and GLFW.
Adds a CI job to test everything compiles correctly.

Known Issues

I can only get everything to reliably compile in OpenMP mode. For errors, see this failing CI job.
- When running with plain C++, everything that links to CUSP fails to compile due to a template specialisation error, as the specialisations cusp::blas::blas_policy<ValueType, cusp::device_memory> and cusp::blas::blas_policy<ValueType, cusp::host_memory> are identical when setting THRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP. I get the same issue when compiling with the provided makefiles.
- When compiling for GPU, there seems to be a version mismatched between Thrust and CUSP, as CUSP tries to include the file thrust/system/cuda/detail/detail/launch_calculator.h which doesn't exist. I've pinned the Thrust version to 1.9.3 as recommended in the README, but I can't figure out which versions of CUSP are compatible with this.
inc/dg/topology/filter_mpit.cu doesn't compile.
Tests are compiled in such a way that they can be run with ctest --test-dir build, but some of the tests require user input, so this command will hang indefinitely.

Further work

Could be easily expanded to include the other executables in the src directory.
The dg target could be exported, which would allow external projects to link against it.
A rework of the tests to avoid user input and return simple 'pass' or 'fail' would allow tests to be run using ctest. Testing could then be easily automated in a CI job.

Feb 24 '25 10:02 LiamPattinson

Hi guys,

I had an initial look at this pull request and before I continue I want to say that I am still learning cmake and I have not yet a complete picture of what is going on, but I already have a few comments

I can see that you wrote CMakeFile targets not only for the 3d feltor code but also for all the test and benchmark programs in the dg/inc folder. That is nice of you but I am currently working on and finishing up a branch that basically rewrites all of those tests in the Catch2 unit-test framework, which essentially deprecates all that work immediately.
I can see that there are a couple of compile errors that you encounter. Since with the Makefile system I have no issues on my computer I am trying to see exactly what compile commands cmake is producing and compare those to what make does but cmake seems a bit convoluted and has some indirections which make it hard to see what it does. Is there a "just show me the plain compile command you are using" option?
The cusp library is no longer maintained and is now preventing us to adopt newer and maybe even faster and more stable versions of thrust and its better Cmake integration. My immediate idea would be to get rid of the cusp dependence entirely. We use cusp for sparse matrix-vector and matrix-matrix multiplication. I can use the cusparse library for these operations on the GPU end. Would you have a recommendation for a fast and easy library for the CPU and OPenMP sparse matrix operations?

Feb 27 '25 12:02 mwiesenberger

Hi @mwiesenberger, sorry for the delay, I'll work on getting this branch caught up to the latest changes shortly. To address your points:

I can see that you wrote CMakeFile targets not only for the 3d feltor code but also for all the test and benchmark programs in the dg/inc folder. That is nice of you but I am currently working on and finishing up a branch that basically rewrites all of those tests in the Catch2 unit-test framework, which essentially deprecates all that work immediately.

That shouldn't be a major problem, as the work I've done already should be adaptable to the changes. I also consider the move to Catch2 to be a significant improvement!

I can see that there are a couple of compile errors that you encounter. Since with the Makefile system I have no issues on my computer I am trying to see exactly what compile commands cmake is producing and compare those to what make does but cmake seems a bit convoluted and has some indirections which make it hard to see what it does. Is there a "just show me the plain compile command you are using" option?

At the build stage, the --verbose flag prints the full compile commands to screen:

cmake -Bbuild .  # Configure the project
cmake --build build --verbose  # Build all targets and print each compilation command

The compile errors I found were all due to cusp, so hopefully this will no longer be a problem after I update the branch.

The cusp library is no longer maintained and is now preventing us to adopt newer and maybe even faster and more stable versions of thrust and its better Cmake integration. My immediate idea would be to get rid of the cusp dependence entirely. We use cusp for sparse matrix-vector and matrix-matrix multiplication. I can use the cusparse library for these operations on the GPU end. Would you have a recommendation for a fast and easy library for the CPU and OPenMP sparse matrix operations?

I'm afraid I'm not terribly well versed in the GPU library ecosystem. Did you manage to find something during your own upgrades?

Mar 27 '25 10:03 LiamPattinson

The issue with the GPU library was solved by using -lcusparse, which is shipped with CUDA, on the CPU and OpenMP end we use a trivial self-made matrix-vector kernel until something better comes along

Mar 27 '25 11:03 mwiesenberger

Hi @mwiesenberger , sorry if your notifications have been getting spammed, I've been repeatedly pushing changes while trying to get my CI job to pass (in hindsight, I should have made a new branch for that).

I've mostly finished updating this PR to the latest standards. It's successfully building and passing tests for all combinations of C++/OpenMP/MPI, but not yet for GPUs (see this GitHub Action). A couple of bug fixes were needed to get the MPI to compile. I'll see if I can get it working with GPUs over the next few days.

Apr 08 '25 16:04 LiamPattinson

Hi @LiamPattinson, nice work. I had a look at the github Action you pointed out and for some reason the test runner for GPUs skips loading the nvcc compiler and uses /usr/bin/c++, instead of nvcc? Not sure what the fix is though

Apr 09 '25 08:04 mwiesenberger

Hi @LiamPattinson, nice work. I had a look at the github Action you pointed out and for some reason the test runner for GPUs skips loading the nvcc compiler and uses /usr/bin/c++, instead of nvcc? Not sure what the fix is though

I think it may be as simple as telling CMake to treat cpp files as if they were cu files when FELTOR_USE_GPU=ON -- I was doing the reverse before your recent updates. I'll look into it when I'm fixing up the other GPU issues.

Apr 09 '25 09:04 LiamPattinson

I've now moved the definition of the dg target into inc/dg/CMakeLists.txt, and set up a new CMakeLists.txt in inc/dg/file. I'm also now running the file tests, and one of them related to netcdfs is failing (action). I also found that inc/file/probes_parser_t.cpp doesn't compile when using nlohmann_json.

Apr 09 '25 18:04 LiamPattinson

Hi @LiamPattinson In the failed test output in the runner I cannot see which test exactly failed, only that it is in the "Easy attributes" test case. Normally, catch2 has more output when a test fails, i.e. it prints the test log and the values that failed, etc. Is it possible to configure the runner to show that? Apart from that I highly suspect that it is to do with the GIT_HASH or COMPILE_TIME macro. Not sure what exactly is wrong without more info though The bug in probes_parser_t.cpp can be fixed by replacing line 94 with

dg::file::WrappedJsonValue js_format = dg::file::string2Json( fromfile.format);

You can leave this for me to fix though

Do I understand this correctly and you succesfully created the inc/dg/file/CMakeLists.txt without the need to move the file folder?

Apr 10 '25 09:04 mwiesenberger

I've managed to get the Cuda stuff building: https://github.com/PlasmaFAIR/feltor/actions/runs/14405876835/job/40402162991

It's failing the tests simply because the GitHub runners don't have GPUs. I'll disable CI tests for GPU jobs now.

I'm not sure what's going wrong in the CUDA + MPI case. It looks like it compiled quite a lot of tests before suddenly realising there wasn't a local Cuda device.

Just as a heads up, I'll be away on leave for the next week, so it might take me a while to get back to you about any follow up questions. Hopefully it won't take much more work to get this finished.

Edit: Updated run showing GPU build passing https://github.com/PlasmaFAIR/feltor/actions/runs/14406458714/job/40404090246

Apr 11 '25 15:04 LiamPattinson

I had a practical question (sorry for spamming your mail with so many individual comments), is there or do you have a prefered way of storing individual cmake build configurations. For example if I go on any HPC cluster and optimise some build flags for their specific architecture, is there a good way to make this configuration available to other users?

Apr 23 '25 08:04 mwiesenberger

I had a practical question (sorry for spamming your mail with so many individual comments), is there or do you have a prefered way of storing individual cmake build configurations. For example if I go on any HPC cluster and optimise some build flags for their specific architecture, is there a good way to make this configuration available to other users?

Yep, you can use presets, which is basically just a json file with the settings in. The official docs are not great for actually learning how to implement them, but there's some good guides elsewhere, for example this one.

Apr 28 '25 10:04 ZedThree

I've finally gotten MPI + CUDA to compile! https://github.com/PlasmaFAIR/feltor/actions/runs/14712161450. Unfortunately I haven't been able to check if all the tests pass, as I ran into Nvidia driver issues on my local GPU cluster.

I've also removed the Navier-Stokes bits from src/CMakeLists.txt.

@mwiesenberger, are there any other tasks you would like me to finish for this PR? There are a couple of outstanding issues I'm aware of, but I don't know if they should be fixed here or in a later PR:

Not yet supporting the geometries or matrix libraries in full
Should rethink the way find_package/FetchContent are being used together, perhaps going to CPM instead (I don't know how well it would handle VCL).

Apr 28 '25 16:04 LiamPattinson

I've finally gotten MPI + CUDA to compile! https://github.com/PlasmaFAIR/feltor/actions/runs/14712161450. Unfortunately I haven't been able to check if all the tests pass, as I ran into Nvidia driver issues on my local GPU cluster.

Perfect!

I've also removed the Navier-Stokes bits from src/CMakeLists.txt.

Thanks

@mwiesenberger, are there any other tasks you would like me to finish for this PR? There are a couple of outstanding issues I'm aware of, but I don't know if they should be fixed here or in a later PR:
* Not yet supporting the `geometries` or `matrix` libraries in full

* Should rethink the way `find_package`/`FetchContent` are being used together, perhaps going to [CPM](https://github.com/cpm-cmake/CPM.cmake) instead (I don't know how well it would handle VCL).

Yeah, this PR is maybe a bit overgown now, so I will test a bit more and then merge. The other (smaller) issues can be fixed in separate PRs

Update: As per our e-mail correspondence I will wait for a small update on the geometries header before merging

Apr 29 '25 08:04 mwiesenberger

Hi @mwiesenberger, apologies for the lack of updates here, I have a lot of other projects competing for my attention at the moment. I'll try to find an hour this afternoon to work on this.

May 16 '25 10:05 LiamPattinson

I'm running into some issues building the geometries library (failing action):

Build failure of conformal_elliptic_b.cpp when running without OpenMP/GPU
I'm not sure I'm setting up Boost correctly. I tried linking directly against Boost::math, but I get the error that Boost_math_DIR is missing. Linking against plain Boost::boost seems to work, even though the missing math component is reported in the logs. @ZedThree have you had experience getting Boost to play nicely with CMake?

As the tests aren't set up for Catch2 yet, I've put the infrastructure in place but haven't added them to the dg_tests target.

I also don't know when I'll next have time to look at this, as my schedule for the next two weeks is completely filled. If the build error can't be easily resolved, it might make more sense to revert the last two commits and merge the PR from that point, as otherwise this PR risks hanging for a further month or so.

May 16 '25 17:05 LiamPattinson

Hi @LiamPattinson,

thanks a lot for the update. The only boost dependence is in the taylor.h file because it calls special mathematical functions. With their integration into the C++-17 standard we actually do not need boost any more, so the dependency can be removed (simply replacing boost::math with std in taylor.h).

I also don't know when I'll next have time to look at this, as my schedule for the next two weeks is completely filled. If the build error can't be easily resolved, it might make more sense to revert the last two commits and merge the PR from that point, as otherwise this PR risks hanging for a further month or so.

Don't worry about it I will take it from here you have helped a lot.

May 19 '25 09:05 mwiesenberger