Oceananigans.jl icon indicating copy to clipboard operation
Oceananigans.jl copied to clipboard

Adding Metal support

Open jagoosw opened this issue 2 years ago • 17 comments

Closes #2618

jagoosw avatar Sep 23 '23 15:09 jagoosw

Great work.

What's the error with TimeInterval?

glwagner avatar Sep 23 '23 23:09 glwagner

How will we test code for Metal GPU? Is there anything available through github actions, or will we have to hook something up via buildkite?

glwagner avatar Sep 24 '23 19:09 glwagner

Copy-pasting this error from the #2618 :

model = HydrostaticFreeSurfaceModel(; grid)
ERROR: Metal does not support Float64 values, try using Float32 instead
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] check_eltype(T::Type)
    @ Metal ~/.julia/packages/Metal/lnkVP/src/array.jl:32
  [3] Metal.MtlArray{Float64, 3, Metal.MTL.MTLResourceStorageModePrivate}(#unused#::UndefInitializer, dims::Tuple{Int64, Int64, Int64})
    @ Metal ~/.julia/packages/Metal/lnkVP/src/array.jl:50
  [4] (Metal.MtlArray{Float64, 3})(#unused#::UndefInitializer, dims::Tuple{Int64, Int64, Int64})
    @ Metal ~/.julia/packages/Metal/lnkVP/src/array.jl:98
  [5] MtlArray
    @ ~/.julia/packages/Metal/lnkVP/src/array.jl:157 [inlined]
  [6] Metal.MtlArray(A::Array{Float64, 3})
    @ Metal ~/.julia/packages/Metal/lnkVP/src/array.jl:173
  [7] arch_array(#unused#::Oceananigans.Architectures.MetalBackend, a::Array{Float64, 3})
    @ Oceananigans.Architectures ~/Documents/Projects/Oceananigans.jl/src/Architectures.jl:75
  [8] Oceananigans.Solvers.FFTBasedPoissonSolver(grid::RectilinearGrid{Float64, Periodic, Periodic, Flat, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, Vector{Float64}}, OffsetArrays.OffsetVector{Float64, Vector{Float64}}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Oceananigans.Architectures.MetalBackend}, planner_flag::UInt32)
    @ Oceananigans.Solvers ~/Documents/Projects/Oceananigans.jl/src/Solvers/fft_based_poisson_solver.jl:61
  [9] Oceananigans.Solvers.FFTBasedPoissonSolver(grid::RectilinearGrid{Float64, Periodic, Periodic, Flat, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, Vector{Float64}}, OffsetArrays.OffsetVector{Float64, Vector{Float64}}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Oceananigans.Architectures.MetalBackend})
    @ Oceananigans.Solvers ~/Documents/Projects/Oceananigans.jl/src/Solvers/fft_based_poisson_solver.jl:51
 [10] Oceananigans.Models.HydrostaticFreeSurfaceModels.FFTImplicitFreeSurfaceSolver(grid::RectilinearGrid{Float32, Periodic, Periodic, Bounded, Float32, Float32, Float32, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, Oceananigans.Architectures.MetalBackend}, settings::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, gravitational_acceleration::Float32)
    @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/Documents/Projects/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/fft_based_implicit_free_surface_solver.jl:67
 [11] build_implicit_step_solver
    @ ~/Documents/Projects/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/fft_based_implicit_free_surface_solver.jl:73 [inlined]
 [12] build_implicit_step_solver(#unused#::Val{:Default}, grid::RectilinearGrid{Float32, Periodic, Periodic, Bounded, Float32, Float32, Float32, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, Oceananigans.Architectures.MetalBackend}, settings::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, gravitational_acceleration::Float32)
    @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/Documents/Projects/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/implicit_free_surface.jl:111
 [13] FreeSurface(free_surface::ImplicitFreeSurface{Nothing, Float64, Nothing, Nothing, Symbol, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, velocities::NamedTuple{(:u, :v, :w), Tuple{Field{Face, Center, Center, Nothing, RectilinearGrid{Float32, Periodic, Periodic, Bounded, Float32, Float32, Float32, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, Oceananigans.Architectures.MetalBackend}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float32, 3, Metal.MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}}, Float32, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}}, Field{Center, Face, Center, Nothing, RectilinearGrid{Float32, Periodic, Periodic, Bounded, Float32, Float32, Float32, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, Oceananigans.Architectures.MetalBackend}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float32, 3, Metal.MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}}, Float32, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}}, Field{Center, Center, Face, Nothing, RectilinearGrid{Float32, Periodic, Periodic, Bounded, Float32, Float32, Float32, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, Oceananigans.Architectures.MetalBackend}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float32, 3, Metal.MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}}, Float32, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}}}}, grid::RectilinearGrid{Float32, Periodic, Periodic, Bounded, Float32, Float32, Float32, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, Oceananigans.Architectures.MetalBackend})
    @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/Documents/Projects/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/implicit_free_surface.jl:95
 [14] HydrostaticFreeSurfaceModel(; grid::RectilinearGrid{Float32, Periodic, Periodic, Bounded, Float32, Float32, Float32, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, Oceananigans.Architectures.MetalBackend}, clock::Clock{Float32}, momentum_advection::Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}, tracer_advection::Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}, buoyancy::SeawaterBuoyancy{Float32, LinearEquationOfState{Float32}, Nothing, Nothing}, coriolis::Nothing, free_surface::ImplicitFreeSurface{Nothing, Float64, Nothing, Nothing, Symbol, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, forcing::NamedTuple{(), Tuple{}}, closure::Nothing, boundary_conditions::NamedTuple{(), Tuple{}}, tracers::Tuple{Symbol, Symbol}, particles::Nothing, biogeochemistry::Nothing, velocities::Nothing, pressure::Nothing, diffusivity_fields::Nothing, auxiliary_fields::NamedTuple{(), Tuple{}})
    @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/Documents/Projects/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/hydrostatic_free_surface_model.jl:169
 [15] top-level scope
    @ REPL[5]:1
 [16] top-level scope
    @ ~/.julia/packages/Metal/lnkVP/src/initialization.jl:57

The eltype of the grid is not being propagated here:

https://github.com/CliMA/Oceananigans.jl/blob/e16cdc6cfb67703df9e29368017868331f68b1c0/src/Models/HydrostaticFreeSurfaceModels/fft_based_implicit_free_surface_solver.jl#L61-L64

So that needs to be fixed.

However, are FFTs available on Metal? If not, then no FFT code can be ported there.

glwagner avatar Sep 24 '23 19:09 glwagner

Thanks, I'll go through all the comments tomorrow!

~~As far as I can tell there are no Julia FFT libraries yet based on this discussion https://discourse.julialang.org/t/metal-jl-does-not-speed-up-fft/95528/3 so~~(see below) I think that is going to be the main barrier at the moment.

I don't think any metal runners are available for GitHub actions.

jagoosw avatar Sep 24 '23 19:09 jagoosw

What's the error with TimeInterval?

I think the issue was the interval being stored as a Float64, but I'm not sure why that would ever endup in a kernel.

jagoosw avatar Sep 24 '23 19:09 jagoosw

I've just realised that the AppleAccelerate library is actually for M1 CPU acceleration so isn't the CUDA FFT equivalent we need here.

jagoosw avatar Sep 25 '23 17:09 jagoosw

Perhaps we could convert to this: https://github.com/DTolm/VkFFT which supports hardware-accelerated FFT on CUDA, Metal and lots of others. It looks like that library is more performant than cuFFT as well.

jagoosw avatar Sep 25 '23 17:09 jagoosw

What's the error with TimeInterval?

I think the issue was the interval being stored as a Float64, but I'm not sure why that would ever endup in a kernel.

(copy-pasting from #2618)

One possibility is that there is a type promotion occurring within align_time_step:

https://github.com/CliMA/Oceananigans.jl/blob/e16cdc6cfb67703df9e29368017868331f68b1c0/src/Simulations/run.jl#L24-L33

and

https://github.com/CliMA/Oceananigans.jl/blob/e16cdc6cfb67703df9e29368017868331f68b1c0/src/Simulations/run.jl#L133-L134

This would cause widespread issues in switching to single precision, so @simone-silvestri may want to take note.

The quick fix is to convert after calculating the aligned time-step:

Δt = aligned_time_step(sim, sim.Δt)
Δt = convert(eltype(sim.model), Δt)

I think we need a more defined interface for this in the long run though. We really need both eltype (the floating point type used by state variables, grid metrics, etc) and a timetype (the type of model.clock.time).

(Also I'm not sure eltype(model) is defined yet, but it should be...)

glwagner avatar Sep 25 '23 22:09 glwagner

Perhaps we could convert to this: https://github.com/DTolm/VkFFT which supports hardware-accelerated FFT on CUDA, Metal and lots of others. It looks like that library is more performant than cuFFT as well.

Why do we have to convert? Can we use that only for Metal?

glwagner avatar Sep 25 '23 22:09 glwagner

Yeah we could just do it for Metal. I was just thinking it might be just as much effort as using it for all.

jagoosw avatar Sep 26 '23 09:09 jagoosw

Yeah we could just do it for Metal. I was just thinking it might be just as much effort as using it for all.

I think it would be more effort if we take into account the need to rebenchmark a lot of cases. I also think it adds some risk...

glwagner avatar Sep 26 '23 10:09 glwagner

I think it would be more effort if we take into account the need to rebenchmark a lot of cases. I also think it adds some risk...

Yeah true, they do claim it is more peformant so maybe something to consider in the future.

jagoosw avatar Sep 26 '23 10:09 jagoosw

I've now had a play trying to wrap VkFFT with https://github.com/JuliaInterop/Clang.jl/tree/master but it is proving difficult given my inexperience with C.

Does anyone working on Oceananigans have experience doing that sort of thing?

jagoosw avatar Sep 26 '23 10:09 jagoosw

I've now had a play trying to wrap VkFFT with https://github.com/JuliaInterop/Clang.jl/tree/master but it is proving difficult given my inexperience with C.

Does anyone working on Oceananigans have experience doing that sort of thing?

Could be worth asking on julia slack! You'll have to ship an independent wrapper package (eg VkFFT.jl) and figure out how to precompile the binaries, right (so we can install everything from the REPL)?

Could be good for this PR to focus on getting explicit free surface to work, then build up the rest of the features after that. Doing this for real will also require figuring testing out I think.

glwagner avatar Sep 26 '23 11:09 glwagner

weird that all the tests fail, when I run e.g. RectilinearGrid(CPU(), FT, size=(16, 16, 16), extent=(1, 1, 1)) locally it works fine

jagoosw avatar Sep 29 '23 10:09 jagoosw

Curious what's the status of this effort to add Metal support to Oceananigans.

truedichotomy avatar May 03 '24 19:05 truedichotomy

Curious what's the status of this effort to add Metal support to Oceananigans.

It's crazy how easy it is to add this support, but the major limitation is that Metal only supports Float32. There hasn't been much effort to validate anything for Float32, though this is a worthwhile goal...

Also if we do refactor this PR, I think we should probably put the Metal functionality in an extension, much as #3468 does.

glwagner avatar May 03 '24 20:05 glwagner