Isolate CUDA
This PR isolates CUDA into src/arch_cuda.jl. This removes any direct CUDA calls in the remaining Oceananigans code base. That feel can either serve as a template for a new GPU architecture or for a future CUDA extension. @vchuravy
Possibly, we should simply implement a CUDA extension in this PR with appropriate organization of the code and get on with the breaking change!
tl;dr then after this is merged, anybody doing computations on nvidia GPU has to write
using Oceananigans
using CUDA
@simone-silvestri curious to hear your thoughts
I think it's a good idea. It provides templates to add new architectures and makes the code completely architecture agnostic. the extra using CUDA is a small price to pay.
@michel2323 let us know when this is ready for prime time
@glwagner For the failing tests, we have 4 in total
- oceanangians-distributed: I think it can't find the commit because it's run from a fork.
- cpu-turbulence-closure-tests:
Bus error. No idea man. - gpu-multi-region-tests: That's the hard one I have to sit on. I dived into it and it's definitely my changes. However, I don't understand what is different at runtime that triggers this.
- Documentation something.
I think the main problem is that I haven't figured out what getdevice actually means across all objects where it is implemented. In particular, there's a bunch of getdevice(somearray). @vchuravy How would that look like with KA? Do the arrays know on which device they are?
Do the arrays know on which device they are?
To my knowledge that's an ill-formed query.
Do the arrays know on which device they are?
To my knowledge that's an ill-formed query.
@simone-silvestri @glwagner How do you want to proceed with these? Can this be rewritten to only use stuff from Architectures?
https://github.com/michel2323/Oceananigans.jl/blob/2e5f75498e8fa7896a91241351c0e2bac9904adc/src/Utils/multi_region_transformation.jl#L54-L70
@michel2323 let us know when this is ready for prime time
Finally ready. The documentation breaks due to something unrelated I think. buildkite/oceananigans-distributed isn't run because the code comes from a fork.
Seems like this is getting close which is exciting!
@siddharthabishnu can you take a look at the cubed sphere / multi region stuff here, it will affect you
@michel2323 @navidcy if you have time to look at the failling tests you might be able to push this PR forward! I think we are close. I will have time next week.
I fixed the docs.
The last error seems to be coming from MultiRegion. The cubed sphere simulation fails at the run!(simulation) when it tries to write output. Seems that the error is coming from an iterator? I wasn't able to figure it out.
Here's an MWE
using Oceananigans
grid = ConformalCubedSphereGrid(CPU(); panel_size = (18, 18, 9), z = (0, 1), radius = 1, horizontal_direction_halo = 6)
model = HydrostaticFreeSurfaceModel(; grid,
momentum_advection = WENOVectorInvariant(order=5),
tracer_advection = WENO(order=5),
free_surface = SplitExplicitFreeSurface(grid; substeps=12),
coriolis = HydrostaticSphericalCoriolis(eltype(grid)),
tracers = :b,
buoyancy = BuoyancyTracer())
simulation = Simulation(model, Īt=60, stop_time=600)
simulation.output_writers[:fields] = JLD2Writer(model, fields(model);
schedule = IterationInterval(2),
filename = "cubed_sphere_output",
verbose = false,
overwrite_existing = true)
run!(simulation)
julia> using Oceananigans
[ Info: Oceananigans will use 12 threads
julia> grid = ConformalCubedSphereGrid(CPU(); panel_size = (18, 18, 9), z = (0, 1), radius = 1, horizontal_direction_halo = 6)
ConformalCubedSphereGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} partitioned on CPU():
āāā grids: 18Ć18Ć9 OrthogonalSphericalShellGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} on CPU with 6Ć6Ć6 halo and with precomputed metrics
āāā partitioning: CubedSpherePartition with (1 region in each panel)
āāā connectivity: CubedSphereConnectivity
āāā devices: (CPU(), CPU(), CPU(), CPU(), CPU(), CPU())
julia> model = HydrostaticFreeSurfaceModel(; grid,
momentum_advection = WENOVectorInvariant(order=5),
tracer_advection = WENO(order=5),
free_surface = SplitExplicitFreeSurface(grid; substeps=12),
coriolis = HydrostaticSphericalCoriolis(eltype(grid)),
tracers = :b,
buoyancy = BuoyancyTracer())
HydrostaticFreeSurfaceModel{CPU, MultiRegionGrid}(time = 0 seconds, iteration = 0)
āāā grid: 18Ć18Ć9 ConformalCubedSphereGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} on CPU with 6Ć6Ć6 halo
āāā timestepper: QuasiAdamsBashforth2TimeStepper
āāā tracers: b
āāā closure: Nothing
āāā buoyancy: BuoyancyTracer with Ä = NegativeZDirection()
āāā free surface: SplitExplicitFreeSurface with gravitational acceleration 9.80665 m sā»Ā²
ā āāā substepping: FixedSubstepNumber(8)
āāā advection scheme:
ā āāā momentum: MultiRegionObject{NTuple{6, WENOVectorInvariant{3, 3, Float64, Oceananigans.Advection.OnlySelfUpwinding{Centered{2, Float64, Centered{1, Float64, Nothing}}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.u_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.v_smoothness)}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, Oceananigans.Advection.VelocityStencil, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, Oceananigans.Advection.OnlySelfUpwinding{Centered{2, Float64, Centered{1, Float64, Nothing}}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.u_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.v_smoothness)}}}}, NTuple{6, CPU}, KernelAbstractions.CPU}
ā āāā b: WENO{3, Float64, Float32}(order=5)
āāā coriolis: HydrostaticSphericalCoriolis{Oceananigans.Advection.EnstrophyConserving{Float64}, Float64}
julia> simulation = Simulation(model, Īt=60, stop_time=600)
Simulation of HydrostaticFreeSurfaceModel{CPU, MultiRegionGrid}(time = 0 seconds, iteration = 0)
āāā Next time step: 1 minute
āāā Elapsed wall time: 0 seconds
āāā Wall time per iteration: NaN days
āāā Stop time: 10 minutes
āāā Stop iteration: Inf
āāā Wall time limit: Inf
āāā Minimum relative step: 0.0
āāā Callbacks: OrderedDict with 4 entries:
ā āāā stop_time_exceeded => 4
ā āāā stop_iteration_exceeded => -
ā āāā wall_time_limit_exceeded => e
ā āāā nan_checker => }
āāā Output writers: OrderedDict with no entries
āāā Diagnostics: OrderedDict with no entries
julia> simulation.output_writers[:fields] = JLD2Writer(model, fields(model);
schedule = IterationInterval(2),
filename = "cubed_sphere_output",
verbose = false,
overwrite_existing = true)
JLD2Writer scheduled on IterationInterval(2):
āāā filepath: cubed_sphere_output.jld2
āāā 7 outputs: (u, v, w, b, Ī·, U, V)
āāā array type: Array{Float32}
āāā including: [:grid, :coriolis, :buoyancy, :closure]
āāā file_splitting: NoFileSplitting
āāā file size: 1.9 MiB
julia> run!(simulation)
[ Info: Initializing simulation...
ERROR: MethodError: no method matching MultiRegionObject(::NTuple{6, Array{Float32, 3}})
Closest candidates are:
MultiRegionObject(::KernelAbstractions.Backend, ::Tuple, ::Tuple)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:25
MultiRegionObject(::KernelAbstractions.Backend, Any...; devices)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:18
MultiRegionObject(::Oceananigans.Architectures.AbstractArchitecture, ::Tuple, ::Tuple)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:34
...
Stacktrace:
[1] convert_output(mo::MultiRegionObject{ā¦}, writer::JLD2Writer{ā¦})
@ Oceananigans.MultiRegion ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/MultiRegion/multi_region_output_writers.jl:54
[2] fetch_and_convert_output(output::Field{ā¦}, model::HydrostaticFreeSurfaceModel{ā¦}, writer::JLD2Writer{ā¦})
@ Oceananigans.OutputWriters ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/OutputWriters/fetch_output.jl:40
[3] (::Oceananigans.OutputWriters.var"#36#37"{JLD2Writer{ā¦}, HydrostaticFreeSurfaceModel{ā¦}})(::Tuple{Symbol, Field{ā¦}})
@ Oceananigans.OutputWriters ./none:0
[4] iterate
@ ./generator.jl:47 [inlined]
[5] merge(a::@NamedTuple{}, itr::Base.Generator{Base.Iterators.Zip{Tuple{ā¦}}, Oceananigans.OutputWriters.var"#36#37"{JLD2Writer{ā¦}, HydrostaticFreeSurfaceModel{ā¦}}})
@ Base ./namedtuple.jl:360
[6] NamedTuple
@ ./namedtuple.jl:151 [inlined]
[7] macro expansion
@ ./timing.jl:395 [inlined]
[8] write_output!(writer::JLD2Writer{ā¦}, model::HydrostaticFreeSurfaceModel{ā¦})
@ Oceananigans.OutputWriters ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/OutputWriters/jld2_writer.jl:253
[9] write_output!(writer::JLD2Writer{ā¦}, sim::Simulation{ā¦})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/simulation.jl:252
[10] initialize!(sim::Simulation{ā¦})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:243
[11] time_step!(sim::Simulation{ā¦})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:136
[12] run!(sim::Simulation{ā¦}; pickup::Bool)
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:105
[13] run!(sim::Simulation{ā¦})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:92
[14] top-level scope
@ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.
``
I added the backend as first arg in the MultiRegionObject constructor; see
https://github.com/michel2323/Oceananigans.jl/blob/05b1d9927275d7ec74fce2f7a7afab2769bc21d5/src/MultiRegion/multi_region_output_writers.jl#L56
Is this the right thing to do?
Now there is a FieldTimeSeries-related error is further down in the test_multi_region_cubed_sphere.jl...
Providing CPU() to MultiRegionObject constructor seems to do the job...
convert_output(mo::MultiRegionObject, writer) =
MultiRegionObject(CPU(), Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects))
But is this the right thing to do? Not sure.
All tests on tartarus pass! I'd like to see the distributed CI pass as well tho...
Distributed CI passes š
Providing CPU() to MultiRegionObject constructor seems to do the job...
convert_output(mo::MultiRegionObject, writer) = MultiRegionObject(CPU(), Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects))But is this the right thing to do? Not sure.
I think the more general form is
convert_output(mo::MultiRegionObject, writer) =
MultiRegionObject(
architecture_from_type(writer.array_type),
Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects)
)
I had to add:
architecture_from_type(type::Type{<:AbstractArray}) = architecture(type())
@glwagner Opinions?
@navidcy Thank you so much for the review and fixes!
hi @michel2323,
I doubt that the architecture_from_type method did the job -- the tests still fail.
That's because Array type cannot be instantiated like Array() I believe...
What do we want here? We want it to return the architecture that corresponds to the outer type of the writer.array_type, correct? E.g., if this is Array{Float32} we want CPU() and if it's CuArray{Float64} we want GPU(), etc?
hi @michel2323, I doubt that the
architecture_from_typemethod did the job -- the tests still fail.That's because
Arraytype cannot be instantiated likeArray()I believe...What do we want here? We want it to return the architecture that corresponds to the outer type of the
writer.array_type, correct? E.g., if this isArray{Float32}we want CPU() and if it'sCuArray{Float64}we wantGPU(), etc?
ff957c8 attempts to resolve this
I'm merging this.
go for it!