Oceananigans.jl AMD GPU Support via an extension for `AMDGPU`

This PR replaces #3468 - editing is allowed by maintainers.

Feb 13 '24 18:02 fluidnumerics-joe

I'll try to convert this to an extension. I'll do it in a single commit so that it's easily revertible. How does that sound @fluidnumerics-joe?

Feb 14 '24 09:02 navidcy

@fluidnumerics-joe, is the GPUArrays a dependency only for the allowscalar? If so, I think GPUArraysCore is much more lighter and includes allowscalar.

(saw this from @vchuravy's attempts over at https://github.com/CliMA/Oceananigans.jl/pull/3066)

Feb 14 '24 10:02 navidcy

OK, with 9916af8 I think I moved (almost) all the AMDGPU-related methods into an extension.

@fluidnumerics-joe now when you do

julia> using Oceananigans

you don't have access to the AMDGPU methods you added. But if you do

julia> using Oceananigans, AMDGPU

then the extension loads and everything is available!

Feb 14 '24 17:02 navidcy

I'll try to convert this to an extension. I'll do it in a single commit so that it's easily revertible. How does that sound @fluidnumerics-joe?

Sounds good.

Feb 14 '24 18:02 fluidnumerics-joe

@fluidnumerics-joe can you confirm that all works OK for you now with the extension? The only thing that I couldn't manage to do is to export the alias ROCmGPU. I don't know if you are able to export things from the extension...

If things work then we can, possibly, discuss how to get a machine for some AMD-enabled CI?

Feb 14 '24 22:02 navidcy

@glwagner, 72b12c8 seems OK, right? (just wanted another set of eyes to have a look)

Feb 15 '24 13:02 navidcy

@glwagner, 72b12c8 seems OK, right? (just wanted another set of eyes to have a look)

looks benign

Feb 15 '24 15:02 glwagner

@fluidnumerics-joe can you confirm that all works OK for you now with the extension? The only thing that I couldn't manage to do is to export the alias ROCmGPU. I don't know if you are able to export things from the extension...

If things work then we can, possibly, discuss how to get a machine for some AMD-enabled CI?

I'll give this a go today and let you know where we're at..

Feb 15 '24 16:02 fluidnumerics-joe

@christophernhill

Feb 15 '24 16:02 glwagner

I don't know if you are able to export things from the extension...

No that is one of the limitation, extensions can't export new things.

Feb 15 '24 17:02 vchuravy

Here's where we're at. I've made the following modifications to baroclinic_adjustment.jl

using Oceananigans, AMDGPU

and the grid construction now specifies GPU architecture with GPU(AMDGPU.ROCBackend()), ie,

grid = RectilinearGrid(GPU(AMDGPU.ROCBackend());
                       size = (48, 48, 8),
                       x = (0, Lx),
                       y = (-Ly/2, Ly/2),
                       z = (-Lz, 0),
                       topology = (Periodic, Bounded, Bounded))

When running this, we hit a runtime issue at plan_forward_transform

$ julia --project=. baroclinic_adjustment.jl 
ERROR: LoadError: MethodError: no method matching plan_forward_transform(::ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, ::Periodic, ::Vector{Int64}, ::UInt32)

Closest candidates are:
  plan_forward_transform(::CUDA.CuArray, ::Union{Bounded, Periodic}, ::Any, ::Any)
   @ Oceananigans ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/plan_transforms.jl:36
  plan_forward_transform(::Array, ::Periodic, ::Any, ::Any)
   @ Oceananigans ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/plan_transforms.jl:16
  plan_forward_transform(::Union{CUDA.CuArray, Array}, ::Flat, ::Any...)
   @ Oceananigans ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/plan_transforms.jl:47
  ...

Stacktrace:
 [1] plan_transforms(grid::RectilinearGrid{Float64, Periodic, Bounded, Flat, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GPU{ROCBackend}}, storage::ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, planner_flag::UInt32)
   @ Oceananigans.Solvers ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/plan_transforms.jl:93
 [2] Oceananigans.Solvers.FFTBasedPoissonSolver(grid::RectilinearGrid{Float64, Periodic, Bounded, Flat, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GPU{ROCBackend}}, planner_flag::UInt32)
   @ Oceananigans.Solvers ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/fft_based_poisson_solver.jl:65
 [3] Oceananigans.Solvers.FFTBasedPoissonSolver(grid::RectilinearGrid{Float64, Periodic, Bounded, Flat, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GPU{ROCBackend}})
   @ Oceananigans.Solvers ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/fft_based_poisson_solver.jl:51
 [4] Oceananigans.Models.HydrostaticFreeSurfaceModels.FFTImplicitFreeSurfaceSolver(grid::RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, settings::@Kwargs{}, gravitational_acceleration::Float64)
   @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/fft_based_implicit_free_surface_solver.jl:67
 [5] build_implicit_step_solver
   @ ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/fft_based_implicit_free_surface_solver.jl:73 [inlined]
 [6] build_implicit_step_solver(::Val{:Default}, grid::RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, settings::@Kwargs{}, gravitational_acceleration::Float64)
   @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/implicit_free_surface.jl:111
 [7] FreeSurface(free_surface::ImplicitFreeSurface{Nothing, Float64, Nothing, Nothing, Symbol, @Kwargs{}}, velocities::@NamedTuple{u::Field{Face, Center, Center, Nothing, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}}, v::Field{Center, Face, Center, Nothing, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Open, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Open, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}}, w::Field{Center, Center, Face, Nothing, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}}}, grid::RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}})
   @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/implicit_free_surface.jl:95
 [8] HydrostaticFreeSurfaceModel(; grid::RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, clock::Clock{Float64}, momentum_advection::WENO{3, Float64, Nothing, Nothing, Nothing, true, Nothing, WENO{2, Float64, Nothing, Nothing, Nothing, true, Nothing, UpwindBiased{1, Float64, Nothing, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{2, Float64, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}}, tracer_advection::WENO{3, Float64, Nothing, Nothing, Nothing, true, Nothing, WENO{2, Float64, Nothing, Nothing, Nothing, true, Nothing, UpwindBiased{1, Float64, Nothing, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{2, Float64, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}}, buoyancy::BuoyancyTracer, coriolis::BetaPlane{Float64}, free_surface::ImplicitFreeSurface{Nothing, Float64, Nothing, Nothing, Symbol, @Kwargs{}}, forcing::@NamedTuple{}, closure::Nothing, boundary_conditions::@NamedTuple{}, tracers::Symbol, particles::Nothing, biogeochemistry::Nothing, velocities::Nothing, pressure::Nothing, diffusivity_fields::Nothing, auxiliary_fields::@NamedTuple{})
   @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/hydrostatic_free_surface_model.jl:167
 [9] top-level scope
   @ ~/fluidnumerics-joe/Oceananigans.jl/bench/baroclinic_adjustment.jl:44
in expression starting at /home/joe/fluidnumerics-joe/Oceananigans.jl/bench/baroclinic_adjustment.jl:44

Feb 15 '24 17:02 fluidnumerics-joe

Seems that we need to add methods for planning FFTs.

Feb 15 '24 17:02 navidcy

Seems that we need to add methods for planning FFTs.

I'll take a crack at this with an extension for Solvers.

Feb 15 '24 19:02 fluidnumerics-joe

I've gotten the transforms taken care of, but now the baroclinic_adjustment example fails with

$ julia --project=. baroclinic_adjustment.jl 
ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
  [3] getindex
    @ ~/.julia/packages/GPUArrays/dAUOE/src/host/indexing.jl:48 [inlined]
  [4] scalar_getindex(::ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, ::Int64, ::Vararg{Int64})
    @ GPUArrays ~/.julia/packages/GPUArrays/dAUOE/src/host/indexing.jl:34
  [5] _getindex
    @ ~/.julia/packages/GPUArrays/dAUOE/src/host/indexing.jl:17 [inlined]
  [6] getindex
    @ ~/.julia/packages/GPUArrays/dAUOE/src/host/indexing.jl:15 [inlined]
  [7] getindex
    @ ./subarray.jl:288 [inlined]
  [8] macro expansion
    @ ./multidimensional.jl:917 [inlined]
  [9] macro expansion
    @ ./cartesian.jl:64 [inlined]
 [10] macro expansion
    @ ./multidimensional.jl:912 [inlined]
 [11] _unsafe_getindex!
    @ ./multidimensional.jl:925 [inlined]
 [12] _unsafe_getindex(::IndexCartesian, ::SubArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, ::Int64, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}})
    @ Base ./multidimensional.jl:903
 [13] _getindex
    @ ./multidimensional.jl:889 [inlined]
 [14] getindex(::SubArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, ::Int64, ::Function, ::Function)
    @ Base ./abstractarray.jl:1291
 [15] top-level scope
    @ ~/fluidnumerics-joe/Oceananigans.jl/bench/baroclinic_adjustment.jl:84

Note that a similar error occurs when using CUDA backend on Nvidia GPUs on the main branch and in my branch, suggesting this error is coming from the main branch.

Feb 15 '24 19:02 fluidnumerics-joe

Commented out the first plot of the buoyancy and was able to get past this. However, there's a correctness bug it seems.. It works fine with the CPU backend. I'll test it out on an Nvidia GPU tomorrow morning.

[ Info: Running the simulation...
[ Info: Initializing simulation...
[00.00%] i: 0, t: 0 seconds, wall time: 20.432 seconds, max(u): (0.000e+00, 0.000e+00, 0.000e+00) m/s, next Δt: 20 minutes
[ Info:     ... simulation initialization complete (23.694 seconds)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (12.989 seconds).
[ Info: time = NaN, iteration = 100: NaN found in field u. Stopping simulation.
[00NaN%] i: 100, t: NaN days, wall time: 27.465 seconds, max(u): (   NaN, 0.000e+00, 0.000e+00) m/s, next Δt: NaN days
[ Info: Simulation completed in 41.653 seconds

Feb 15 '24 21:02 fluidnumerics-joe

@fluidnumerics-joe @simone-silvestri should we try to get this running with split explicit free surface before tackling FFTs?

Feb 16 '24 05:02 glwagner

I'm game to try. Should we modify the baroclinic adjustment problem or is there another benchmark you have in mind?

Feb 16 '24 10:02 fluidnumerics-joe

I'm game to try. Should we modify the baroclinic adjustment problem or is there another benchmark you have in mind?

I think it makes sense to keep going with the baroclinic adjustment case!

To change the free surface you'll use

free_surface = SplitExplicitFreeSurface(grid)

as a keyword argument in the model constructor. I think the default parameters for it make sense but @simone-silvestri can confirm.

We can also try with ExplicitFreeSurface() which is even simpler, but in that case we'll have to modify gravitational_acceleration and the time step to get something that can complete in a reasonable amount of time.

Btw if you paste the baroclinic adjustment script you are working with we can also check to make sure it's GPU compatible and possibly help simplify it further.

Feb 17 '24 18:02 glwagner

SplitExplicitFreeSurface works well here. For reference, the script I'm using is here : https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/main/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl

I'll get profiling results posted soon.

Feb 17 '24 20:02 fluidnumerics-joe

SplitExplicitFreeSurface works well here. For reference, the script I'm using is here : https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/main/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl

I'll get profiling results posted soon.

Nice! Yeah, since

https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/9a0c6fa5e3400949d0bb14b3f22b033b64f2d124/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl#L85

is commented out I think this whole script will run on GPUs! The animation at the end I think will be generated on the CPU by default. You can also omit that (unless you want a pretty movie)

Feb 17 '24 23:02 glwagner

Just want to confirm some final steps with @navidcy and @glwagner here to wrap up this PR. At the moment, I believe we just need to put in a method that throws an error for validate_free_surface when the architecture is the AMD GPU and the free surface type is implicit free surface. I'm working on putting this in through the extensions (I believe this is the correct spot) and testing this out. Is there anything else, you want to see to get this merged into main ?

Feb 27 '24 16:02 fluidnumerics-joe

I think we need to have an AMD-enabled CI?

Feb 27 '24 17:02 navidcy

I think we need to have an AMD-enabled CI?

Is this something that is handled on the MIT side ? Only way I can help is through system procurement (we're a Supermicro reseller), or through an allocation on our systems.

Feb 27 '24 17:02 fluidnumerics-joe

we could ask to the julia lab. They have some AMDs dedicated to CI there. Not sure if it is possible to use them

Feb 27 '24 17:02 simone-silvestri

Is this something that is handled on the MIT side ?

Yeah, it’s something the Oceananigans dev team should sort out! :)

Feb 27 '24 17:02 navidcy

Is this something that is handled on the MIT side ?

Yeah, it’s something the Oceananigans dev team should sort out! :)

Curious to know if there's any movement on getting this resolved. I can offer some help in getting an allocation request in to Pawsey Supercomputing Centre - I mentioned to @navidcy that I have a solution for doing CI on systems with job schedulers (like Pawsey's Setonix).

If existing hardware systems at MIT are not available for this, I can also help with procurement, if needed. If you go this route, I can look into providing some time on our systems to get testing rolling.

Mar 13 '24 21:03 fluidnumerics-joe

Just some comments at this point:

At this point, we have the HydrostaticFreeSurface model working with the split explicit free surface. It would be great to find some time later on to figure out what was going on with the implicit free surface on AMD GPUs (is the issue isolated only to that architecture??) and get this resolved.
To get everything moved over to KernelAbstractions would constitute a rather large change, something I think @glwagner expressed an interest in avoiding. I'd vote in favor of pushing this change off for future PR's.
I'm wrapping up a profiling report that includes MI210 and A100 GPU performance; this report will include some recommendations should we be interested in performance improvements on GPU hardware (AMD and Nvidia). This kind of work could also constitute PR's further down the road.
The main outstanding issue seems to be that we need a platform for testing on AMD GPUs.

It appears the CliMA fork Project.toml and Manifest.toml have diverged; I'll take a look to see if I can fix.

Mar 14 '24 14:03 fluidnumerics-joe

Is this something that is handled on the MIT side ?

Yeah, it’s something the Oceananigans dev team should sort out! :)

Curious to know if there's any movement on getting this resolved. I can offer some help in getting an allocation request in to Pawsey Supercomputing Centre - I mentioned to @navidcy that I have a solution for doing CI on systems with job schedulers (like Pawsey's Setonix).

If existing hardware systems at MIT are not available for this, I can also help with procurement, if needed. If you go this route, I can look into providing some time on our systems to get testing rolling.

@simone-silvestri can you please help with this? I agree its critical to get this PR merged ASAP, it's already getting stale. I think we should contact Satori folks first directly or via @christophernhill . @Sbozzolo might be able to help if there are AMD machines on the caltech cluster.

Mar 14 '24 14:03 glwagner

@fluidnumerics-joe let us know if you want help resolving conflicts

Mar 14 '24 14:03 glwagner

@fluidnumerics-joe let us know if you want help resolving conflicts

I think the project.toml and manifest.toml would be best addressed on your side. I'll take a look at the src/Architectures.jl conflict.

Mar 14 '24 14:03 fluidnumerics-joe

Oceananigans.jl Oceananigans.jl copied to clipboard

AMD GPU Support via an extension for `AMDGPU`

Oceananigans.jl
Oceananigans.jl copied to clipboard