Trixi.jl Use Adapt.jl to change storage and element type

In order to eventually support GPU computation we need to use Adapt.jl to allow GPU backend packages to swap out host-array types like CuArray with device-side types like CuDeviceArray.

Additionally this will allow us to change the element type of a simulation by using adapt(Array{Float32}.

Dec 17 '24 16:12 vchuravy

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

[ ] The PR has a single goal that is clear from the PR title and/or description.
[ ] All code changes represent a single set of modifications that logically belong together.
[ ] No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

[ ] The code can be understood easily.
[ ] Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
[ ] There are no redundancies that can be removed by simple modularization/refactoring.
[ ] There are no leftover debug statements or commented code sections.
[ ] The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

[ ] New functions and types are documented with a docstring or top-level comment.
[ ] Relevant publications are referenced in docstrings (see example for formatting).
[ ] Inline comments are used to document longer or unusual code sections.
[ ] Comments describe intent ("why?") and not just functionality ("what?").
[ ] If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

[ ] The PR passes all tests.
[ ] New or modified lines of code are covered by tests.
[ ] New or modified tests run in less then 10 seconds.

Performance

[ ] There are no type instabilities or memory allocations in performance-critical parts.
[ ] If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

[ ] The correctness of the code was verified using appropriate tests.
[ ] If new equations/methods are added, a convergence test has been run and the results are posted in the PR.

Created with :heart: by the Trixi.jl community.

Dec 17 '24 16:12 github-actions[bot]

Codecov Report

Attention: Patch coverage is 68.44262% with 77 lines in your changes missing coverage. Please review.

Project coverage is 96.67%. Comparing base (43cd98d) to head (1acf007). Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/solvers/dgsem_p4est/containers_parallel.jl	26.92%	19 Missing :warning:
src/auxiliary/containers.jl	63.64%	16 Missing :warning:
src/solvers/dgsem_p4est/containers.jl	83.53%	14 Missing :warning:
src/solvers/dg.jl	60.00%	10 Missing :warning:
src/auxiliary/vector_of_arrays.jl	35.71%	9 Missing :warning:
ext/TrixiAMDGPUExt.jl	0.00%	6 Missing :warning:
ext/TrixiCUDAExt.jl	0.00%	2 Missing :warning:
src/semidiscretization/semidiscretization.jl	90.00%	1 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2212      +/-   ##
==========================================
- Coverage   96.83%   96.67%   -0.16%     
==========================================
  Files         504      507       +3     
  Lines       41816    42010     +194     
==========================================
+ Hits        40490    40610     +120     
- Misses       1326     1400      +74

Flag	Coverage Δ
unittests	`96.67% <68.44%> (-0.16%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Feb 03 '25 16:02 codecov[bot]

What's the status of this PR?

Mar 20 '25 10:03 sloede

Okay, current status:

Removed AbstractHeterogenousContainer and replaced it with a storage_type function that uses Adapt infrastructure for wrapped arrays.
Started writing some docs on heterogeneous computing & Trixi
Added a minimal extension on CUDA.jl

Open questions:

[x] Should I untangle the whole VecOfArrays business into a separate PR? I am not all that happy with that implementation, but it is something we need to address for full CUDA support.
[x] Can I add CUDA.jl as a test dependency, or should I pursue more the strategy in #2367 where I add it on-demand?
[ ] Right now we are not adapting the mesh, similar to remake, is that okay?

Apr 21 '25 16:04 vchuravy

Started writing some docs on heterogeneous computing & Trixi

That's great news, excellent!

Should I untangle the whole VecOfArrays business into a separate PR? I am not all that happy with that implementation, but it is something we need to address for full CUDA support.

If it's only with a reasonable overhead, I'd prefer to have a separate PR to make it easer to understand and review the code.

Can I add CUDA.jl as a test dependency, or should I pursue more the strategy in add minimal CUDA pipeline #2367 where I add it on-demand?

Puh. Do you have a rough idea about how much overhead (downloading + precompilation) this will add to the non-CUDA CI runs? That is, are we talking about tens of seconds or rather minutes?

For the time being, my take would be to hold off adding it as a test dependency for now, since it will also impact people testing locally.

Right now we are not adapting the mesh, similar to remake, is that okay?

It should be OK. The mesh in Trixi.jl is only used as an "infrastructure provider", i.e., it allows the solvers to figure out which cells exist, who their neighbors are, where their neighbors live (when running with MPI parallelism) etc. during the initialization phase. Thus in a usual simulation run there are no performance-critical loops working directly on mesh data.

One thought I had: It might make sense to create a meta issue for upcoming (and deferred) changes for GPU/heterogeneous computing support. That is, a single issue to keep track of the various PRs, other issues, and maybe open questions/TODOs that are not living in their own issue.

Apr 22 '25 08:04 sloede

Puh. Do you have a rough idea about how much overhead (downloading + precompilation) this will add to the non-CUDA CI runs? That is, are we talking about tens of seconds or rather minutes?

The most annoying thing is likely downloading the CUDA_Runtime_jll,, other than that we are talking seconds.

The biggest issue is that without it testing the storage_type=X change is almost impossible and would require significant mocking.

If it's only with a reasonable overhead, I'd prefer to have a separate PR to make it easer to understand and review the code.

That's somewhat related to the issue above, it is a bit of code that doesn't impact anything until you are actually trying to move the model to the GPU.

I could only do the real_type part of this PR , but that excludes quite a bit of complexity.

Thus in a usual simulation run there are no performance-critical loops working directly on mesh data.

Yeah, it's more a question if we are going to access data within it from the GPU, but I guess we don't know until we actually have a model running. As far as I can tell right now Lars didn't move it either.

Apr 22 '25 09:04 vchuravy

Puh. Do you have a rough idea about how much overhead (downloading + precompilation) this will add to the non-CUDA CI runs? That is, are we talking about tens of seconds or rather minutes?

The most annoying thing is likely downloading the CUDA_Runtime_jll,, other than that we are talking seconds.

The biggest issue is that without it testing the storage_type=X change is almost impossible and would require significant mocking.

OK, then I'd say let's give it a try and see where it takes us.

If it's only with a reasonable overhead, I'd prefer to have a separate PR to make it easier to understand and review the code.

That's somewhat related to the issue above, it is a bit of code that doesn't impact anything until you are actually trying to move the model to the GPU.

I could only do the real_type part of this PR , but that excludes quite a bit of complexity.

Hm, ok. I understand you're saying that we'd be able to split it, but then not really review the impact that it will have later on in the subsequent PRs? In that case, it might be best to keep it in one PR. In that case, please try to keep the PR as focused as possible (such that it does not blow up in size even more).

Thus in a usual simulation run there are no performance-critical loops working directly on mesh data.

Yeah, it's more a question if we are going to access data within it from the GPU, but I guess we don't know until we actually have a model running. As far as I can tell right now Lars didn't move it either.

Right. No, there shouldn't be any access of mesh data from the GPU during the regular rhs! invocation. All possible mesh accesses would occur in one of the callbacks.

Apr 22 '25 09:04 sloede

This looks like test_trixi_include is passing through an expression and is not widening getting it from the parent scope.

/var/lib/buildkite-agent/builds/gpuci-4/julialang/trixi-dot-jl/examples/p4est_2d_dgsem/elixir_advection_basic.jl
elixir_advection_basic.jl (Float32): Error During Test at /var/lib/buildkite-agent/builds/gpuci-4/julialang/trixi-dot-jl/test/test_trixi.jl:186
  Got exception outside of a @test
  LoadError: UndefVarError: `CuArray` not defined

Jun 30 '25 12:06 vchuravy

This looks like test_trixi_include is passing through an expression and is not widening getting it from the parent scope.

I have almost no idea what this exactly means, but we are actually working on test_trixi_include . https://github.com/trixi-framework/TrixiTest.jl/pull/6

Jun 30 '25 13:06 benegee

Yeah, the issue seems to be that the symbol "CuArray" is "copy-pasted" into the module instead of resolving it ahead of time to Main.CUDA.CuArray. So I would need to put using CUDA into the recipe, which is not ideal.

Jun 30 '25 13:06 vchuravy

Next steps:

Symbol support for AMDGPU
Noisy loading failure for AMDGPU https://github.com/JuliaGPU/AMDGPU.jl/issues/794
elixir_advection_basic.jl (Float32)
Invalidations?
Downgrade?
Aqua

 Aqua.jl: Error During Test at /home/runner/work/Trixi.jl/Trixi.jl/test/test_aqua.jl:20
  Test threw exception
  Expression: isnothing(check_no_implicit_imports(Trixi, skip = (Core, Base, Trixi.P4est, Trixi.T8code, Trixi.EllipsisNotation)))
  ImplicitImportsException
  Module `Trixi` is relying on the following implicit imports:
  * `KernelAbstractions` which is exported by `KernelAbstractions`
  * `@index` which is exported by `KernelAbstractions`
  * `@kernel` which is exported by `KernelAbstractions`
  * `Backend` which is exported by `KernelAbstractions`
  * `get_backend` which is exported by `KernelAbstractions`

Jul 03 '25 19:07 vchuravy

Ok, I seem to have won the battle against the downgrade CI. From my perspective this is now done.

Code-coverage is tricky since we have multiple CI services is now. I can look at that, but I would prefer not holding this PR up for that.

Jul 13 '25 20:07 vchuravy

@ranocha added a news entry as well.

Jul 15 '25 10:07 vchuravy

In the future should I squash the PR beforehand? I normally adjust the commit message before I squash merge so that it is useful (instead of my messy development history)

Jul 17 '25 21:07 vchuravy

We usually don't do that.

Jul 17 '25 22:07 JoshuaLampert