Oceananigans.jl
Oceananigans.jl copied to clipboard
AMD GPU Support via an extension for `AMDGPU`
This PR replaces #3468 - editing is allowed by maintainers.
I'll try to convert this to an extension. I'll do it in a single commit so that it's easily revertible. How does that sound @fluidnumerics-joe?
@fluidnumerics-joe, is the GPUArrays
a dependency only for the allowscalar
? If so, I think GPUArraysCore
is much more lighter and includes allowscalar
.
(saw this from @vchuravy's attempts over at https://github.com/CliMA/Oceananigans.jl/pull/3066)
OK, with 9916af8 I think I moved (almost) all the AMDGPU-related methods into an extension.
@fluidnumerics-joe now when you do
julia> using Oceananigans
you don't have access to the AMDGPU methods you added. But if you do
julia> using Oceananigans, AMDGPU
then the extension loads and everything is available!
I'll try to convert this to an extension. I'll do it in a single commit so that it's easily revertible. How does that sound @fluidnumerics-joe?
Sounds good.
@fluidnumerics-joe can you confirm that all works OK for you now with the extension?
The only thing that I couldn't manage to do is to export the alias ROCmGPU
. I don't know if you are able to export things from the extension...
If things work then we can, possibly, discuss how to get a machine for some AMD-enabled CI?
@glwagner, 72b12c8 seems OK, right? (just wanted another set of eyes to have a look)
@fluidnumerics-joe can you confirm that all works OK for you now with the extension? The only thing that I couldn't manage to do is to export the alias
ROCmGPU
. I don't know if you are able to export things from the extension...If things work then we can, possibly, discuss how to get a machine for some AMD-enabled CI?
I'll give this a go today and let you know where we're at..
@christophernhill
I don't know if you are able to export things from the extension...
No that is one of the limitation, extensions can't export new things.
Here's where we're at.
I've made the following modifications to baroclinic_adjustment.jl
using Oceananigans, AMDGPU
and the grid construction now specifies GPU architecture with GPU(AMDGPU.ROCBackend())
, ie,
grid = RectilinearGrid(GPU(AMDGPU.ROCBackend());
size = (48, 48, 8),
x = (0, Lx),
y = (-Ly/2, Ly/2),
z = (-Lz, 0),
topology = (Periodic, Bounded, Bounded))
When running this, we hit a runtime issue at plan_forward_transform
$ julia --project=. baroclinic_adjustment.jl
ERROR: LoadError: MethodError: no method matching plan_forward_transform(::ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, ::Periodic, ::Vector{Int64}, ::UInt32)
Closest candidates are:
plan_forward_transform(::CUDA.CuArray, ::Union{Bounded, Periodic}, ::Any, ::Any)
@ Oceananigans ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/plan_transforms.jl:36
plan_forward_transform(::Array, ::Periodic, ::Any, ::Any)
@ Oceananigans ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/plan_transforms.jl:16
plan_forward_transform(::Union{CUDA.CuArray, Array}, ::Flat, ::Any...)
@ Oceananigans ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/plan_transforms.jl:47
...
Stacktrace:
[1] plan_transforms(grid::RectilinearGrid{Float64, Periodic, Bounded, Flat, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GPU{ROCBackend}}, storage::ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, planner_flag::UInt32)
@ Oceananigans.Solvers ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/plan_transforms.jl:93
[2] Oceananigans.Solvers.FFTBasedPoissonSolver(grid::RectilinearGrid{Float64, Periodic, Bounded, Flat, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GPU{ROCBackend}}, planner_flag::UInt32)
@ Oceananigans.Solvers ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/fft_based_poisson_solver.jl:65
[3] Oceananigans.Solvers.FFTBasedPoissonSolver(grid::RectilinearGrid{Float64, Periodic, Bounded, Flat, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GPU{ROCBackend}})
@ Oceananigans.Solvers ~/.julia/packages/Oceananigans/DPfYS/src/Solvers/fft_based_poisson_solver.jl:51
[4] Oceananigans.Models.HydrostaticFreeSurfaceModels.FFTImplicitFreeSurfaceSolver(grid::RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, settings::@Kwargs{}, gravitational_acceleration::Float64)
@ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/fft_based_implicit_free_surface_solver.jl:67
[5] build_implicit_step_solver
@ ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/fft_based_implicit_free_surface_solver.jl:73 [inlined]
[6] build_implicit_step_solver(::Val{:Default}, grid::RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, settings::@Kwargs{}, gravitational_acceleration::Float64)
@ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/implicit_free_surface.jl:111
[7] FreeSurface(free_surface::ImplicitFreeSurface{Nothing, Float64, Nothing, Nothing, Symbol, @Kwargs{}}, velocities::@NamedTuple{u::Field{Face, Center, Center, Nothing, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}}, v::Field{Center, Face, Center, Nothing, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Open, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Open, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}}, w::Field{Center, Center, Face, Nothing, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}}}, grid::RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}})
@ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/implicit_free_surface.jl:95
[8] HydrostaticFreeSurfaceModel(; grid::RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, GPU{ROCBackend}}, clock::Clock{Float64}, momentum_advection::WENO{3, Float64, Nothing, Nothing, Nothing, true, Nothing, WENO{2, Float64, Nothing, Nothing, Nothing, true, Nothing, UpwindBiased{1, Float64, Nothing, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{2, Float64, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}}, tracer_advection::WENO{3, Float64, Nothing, Nothing, Nothing, true, Nothing, WENO{2, Float64, Nothing, Nothing, Nothing, true, Nothing, UpwindBiased{1, Float64, Nothing, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{2, Float64, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}}, buoyancy::BuoyancyTracer, coriolis::BetaPlane{Float64}, free_surface::ImplicitFreeSurface{Nothing, Float64, Nothing, Nothing, Symbol, @Kwargs{}}, forcing::@NamedTuple{}, closure::Nothing, boundary_conditions::@NamedTuple{}, tracers::Symbol, particles::Nothing, biogeochemistry::Nothing, velocities::Nothing, pressure::Nothing, diffusivity_fields::Nothing, auxiliary_fields::@NamedTuple{})
@ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/.julia/packages/Oceananigans/DPfYS/src/Models/HydrostaticFreeSurfaceModels/hydrostatic_free_surface_model.jl:167
[9] top-level scope
@ ~/fluidnumerics-joe/Oceananigans.jl/bench/baroclinic_adjustment.jl:44
in expression starting at /home/joe/fluidnumerics-joe/Oceananigans.jl/bench/baroclinic_adjustment.jl:44
Seems that we need to add methods for planning FFTs.
Seems that we need to add methods for planning FFTs.
I'll take a crack at this with an extension for Solvers.
I've gotten the transforms taken care of, but now the baroclinic_adjustment example fails with
$ julia --project=. baroclinic_adjustment.jl
ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
[3] getindex
@ ~/.julia/packages/GPUArrays/dAUOE/src/host/indexing.jl:48 [inlined]
[4] scalar_getindex(::ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, ::Int64, ::Vararg{Int64})
@ GPUArrays ~/.julia/packages/GPUArrays/dAUOE/src/host/indexing.jl:34
[5] _getindex
@ ~/.julia/packages/GPUArrays/dAUOE/src/host/indexing.jl:17 [inlined]
[6] getindex
@ ~/.julia/packages/GPUArrays/dAUOE/src/host/indexing.jl:15 [inlined]
[7] getindex
@ ./subarray.jl:288 [inlined]
[8] macro expansion
@ ./multidimensional.jl:917 [inlined]
[9] macro expansion
@ ./cartesian.jl:64 [inlined]
[10] macro expansion
@ ./multidimensional.jl:912 [inlined]
[11] _unsafe_getindex!
@ ./multidimensional.jl:925 [inlined]
[12] _unsafe_getindex(::IndexCartesian, ::SubArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, ::Int64, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}})
@ Base ./multidimensional.jl:903
[13] _getindex
@ ./multidimensional.jl:889 [inlined]
[14] getindex(::SubArray{Float64, 3, ROCArray{Float64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, ::Int64, ::Function, ::Function)
@ Base ./abstractarray.jl:1291
[15] top-level scope
@ ~/fluidnumerics-joe/Oceananigans.jl/bench/baroclinic_adjustment.jl:84
Note that a similar error occurs when using CUDA backend on Nvidia GPUs on the main branch and in my branch, suggesting this error is coming from the main branch.
Commented out the first plot of the buoyancy and was able to get past this. However, there's a correctness bug it seems.. It works fine with the CPU backend. I'll test it out on an Nvidia GPU tomorrow morning.
[ Info: Running the simulation...
[ Info: Initializing simulation...
[00.00%] i: 0, t: 0 seconds, wall time: 20.432 seconds, max(u): (0.000e+00, 0.000e+00, 0.000e+00) m/s, next Δt: 20 minutes
[ Info: ... simulation initialization complete (23.694 seconds)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (12.989 seconds).
[ Info: time = NaN, iteration = 100: NaN found in field u. Stopping simulation.
[00NaN%] i: 100, t: NaN days, wall time: 27.465 seconds, max(u): ( NaN, 0.000e+00, 0.000e+00) m/s, next Δt: NaN days
[ Info: Simulation completed in 41.653 seconds
@fluidnumerics-joe @simone-silvestri should we try to get this running with split explicit free surface before tackling FFTs?
I'm game to try. Should we modify the baroclinic adjustment problem or is there another benchmark you have in mind?
I'm game to try. Should we modify the baroclinic adjustment problem or is there another benchmark you have in mind?
I think it makes sense to keep going with the baroclinic adjustment case!
To change the free surface you'll use
free_surface = SplitExplicitFreeSurface(grid)
as a keyword argument in the model constructor. I think the default parameters for it make sense but @simone-silvestri can confirm.
We can also try with ExplicitFreeSurface()
which is even simpler, but in that case we'll have to modify gravitational_acceleration
and the time step to get something that can complete in a reasonable amount of time.
Btw if you paste the baroclinic adjustment script you are working with we can also check to make sure it's GPU compatible and possibly help simplify it further.
SplitExplicitFreeSurface
works well here. For reference, the script I'm using is here : https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/main/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl
I'll get profiling results posted soon.
SplitExplicitFreeSurface
works well here. For reference, the script I'm using is here : https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/main/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jlI'll get profiling results posted soon.
Nice! Yeah, since
https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/9a0c6fa5e3400949d0bb14b3f22b033b64f2d124/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl#L85
is commented out I think this whole script will run on GPUs! The animation at the end I think will be generated on the CPU by default. You can also omit that (unless you want a pretty movie)
Just want to confirm some final steps with @navidcy and @glwagner here to wrap up this PR. At the moment, I believe we just need to put in a method that throws an error for validate_free_surface
when the architecture is the AMD GPU and the free surface type is implicit free surface. I'm working on putting this in through the extensions (I believe this is the correct spot) and testing this out. Is there anything else, you want to see to get this merged into main ?
I think we need to have an AMD-enabled CI?
I think we need to have an AMD-enabled CI?
Is this something that is handled on the MIT side ? Only way I can help is through system procurement (we're a Supermicro reseller), or through an allocation on our systems.
we could ask to the julia lab. They have some AMDs dedicated to CI there. Not sure if it is possible to use them
Is this something that is handled on the MIT side ?
Yeah, it’s something the Oceananigans dev team should sort out! :)
Is this something that is handled on the MIT side ?
Yeah, it’s something the Oceananigans dev team should sort out! :)
Curious to know if there's any movement on getting this resolved. I can offer some help in getting an allocation request in to Pawsey Supercomputing Centre - I mentioned to @navidcy that I have a solution for doing CI on systems with job schedulers (like Pawsey's Setonix).
If existing hardware systems at MIT are not available for this, I can also help with procurement, if needed. If you go this route, I can look into providing some time on our systems to get testing rolling.
Just some comments at this point:
- At this point, we have the HydrostaticFreeSurface model working with the split explicit free surface. It would be great to find some time later on to figure out what was going on with the implicit free surface on AMD GPUs (is the issue isolated only to that architecture??) and get this resolved.
- To get everything moved over to KernelAbstractions would constitute a rather large change, something I think @glwagner expressed an interest in avoiding. I'd vote in favor of pushing this change off for future PR's.
- I'm wrapping up a profiling report that includes MI210 and A100 GPU performance; this report will include some recommendations should we be interested in performance improvements on GPU hardware (AMD and Nvidia). This kind of work could also constitute PR's further down the road.
- The main outstanding issue seems to be that we need a platform for testing on AMD GPUs.
It appears the CliMA fork Project.toml
and Manifest.toml
have diverged; I'll take a look to see if I can fix.
Is this something that is handled on the MIT side ?
Yeah, it’s something the Oceananigans dev team should sort out! :)
Curious to know if there's any movement on getting this resolved. I can offer some help in getting an allocation request in to Pawsey Supercomputing Centre - I mentioned to @navidcy that I have a solution for doing CI on systems with job schedulers (like Pawsey's Setonix).
If existing hardware systems at MIT are not available for this, I can also help with procurement, if needed. If you go this route, I can look into providing some time on our systems to get testing rolling.
@simone-silvestri can you please help with this? I agree its critical to get this PR merged ASAP, it's already getting stale. I think we should contact Satori folks first directly or via @christophernhill . @Sbozzolo might be able to help if there are AMD machines on the caltech cluster.
@fluidnumerics-joe let us know if you want help resolving conflicts
@fluidnumerics-joe let us know if you want help resolving conflicts
I think the project.toml and manifest.toml would be best addressed on your side. I'll take a look at the src/Architectures.jl
conflict.