Oceananigans.jl icon indicating copy to clipboard operation
Oceananigans.jl copied to clipboard

Isolate CUDA

Open michel2323 opened this issue 7 months ago • 9 comments

This PR isolates CUDA into src/arch_cuda.jl. This removes any direct CUDA calls in the remaining Oceananigans code base. That feel can either serve as a template for a new GPU architecture or for a future CUDA extension. @vchuravy

michel2323 avatar May 12 '25 14:05 michel2323

Possibly, we should simply implement a CUDA extension in this PR with appropriate organization of the code and get on with the breaking change!

tl;dr then after this is merged, anybody doing computations on nvidia GPU has to write

using Oceananigans
using CUDA

glwagner avatar May 12 '25 15:05 glwagner

@simone-silvestri curious to hear your thoughts

glwagner avatar May 12 '25 15:05 glwagner

I think it's a good idea. It provides templates to add new architectures and makes the code completely architecture agnostic. the extra using CUDA is a small price to pay.

simone-silvestri avatar May 12 '25 17:05 simone-silvestri

@michel2323 let us know when this is ready for prime time

glwagner avatar May 24 '25 17:05 glwagner

@glwagner For the failing tests, we have 4 in total

  • oceanangians-distributed: I think it can't find the commit because it's run from a fork.
  • cpu-turbulence-closure-tests: Bus error. No idea man.
  • gpu-multi-region-tests: That's the hard one I have to sit on. I dived into it and it's definitely my changes. However, I don't understand what is different at runtime that triggers this.
  • Documentation something.

michel2323 avatar Jun 11 '25 20:06 michel2323

I think the main problem is that I haven't figured out what getdevice actually means across all objects where it is implemented. In particular, there's a bunch of getdevice(somearray). @vchuravy How would that look like with KA? Do the arrays know on which device they are?

michel2323 avatar Jun 12 '25 15:06 michel2323

Do the arrays know on which device they are?

To my knowledge that's an ill-formed query.

vchuravy avatar Jun 12 '25 17:06 vchuravy

Do the arrays know on which device they are?

To my knowledge that's an ill-formed query.

@simone-silvestri @glwagner How do you want to proceed with these? Can this be rewritten to only use stuff from Architectures?

https://github.com/michel2323/Oceananigans.jl/blob/2e5f75498e8fa7896a91241351c0e2bac9904adc/src/Utils/multi_region_transformation.jl#L54-L70

michel2323 avatar Jun 12 '25 17:06 michel2323

@michel2323 let us know when this is ready for prime time

Finally ready. The documentation breaks due to something unrelated I think. buildkite/oceananigans-distributed isn't run because the code comes from a fork.

michel2323 avatar Jun 19 '25 18:06 michel2323

Seems like this is getting close which is exciting!

glwagner avatar Jun 26 '25 16:06 glwagner

@siddharthabishnu can you take a look at the cubed sphere / multi region stuff here, it will affect you

glwagner avatar Jun 27 '25 14:06 glwagner

@michel2323 @navidcy if you have time to look at the failling tests you might be able to push this PR forward! I think we are close. I will have time next week.

glwagner avatar Jul 03 '25 14:07 glwagner

I fixed the docs.

The last error seems to be coming from MultiRegion. The cubed sphere simulation fails at the run!(simulation) when it tries to write output. Seems that the error is coming from an iterator? I wasn't able to figure it out.

Here's an MWE

using Oceananigans
grid = ConformalCubedSphereGrid(CPU(); panel_size = (18, 18, 9), z = (0, 1), radius = 1, horizontal_direction_halo = 6)
model = HydrostaticFreeSurfaceModel(; grid,
                                    momentum_advection = WENOVectorInvariant(order=5),
                                    tracer_advection = WENO(order=5),
                                    free_surface = SplitExplicitFreeSurface(grid; substeps=12),
                                    coriolis = HydrostaticSphericalCoriolis(eltype(grid)),
                                    tracers = :b,
                                    buoyancy = BuoyancyTracer())
simulation = Simulation(model, Δt=60, stop_time=600)

simulation.output_writers[:fields] = JLD2Writer(model, fields(model);
                                                schedule = IterationInterval(2),
                                                filename = "cubed_sphere_output",
                                                verbose = false,
                                                overwrite_existing = true)

run!(simulation)

navidcy avatar Jul 06 '25 21:07 navidcy

julia> using Oceananigans
[ Info: Oceananigans will use 12 threads

julia> grid = ConformalCubedSphereGrid(CPU(); panel_size = (18, 18, 9), z = (0, 1), radius = 1, horizontal_direction_halo = 6)
ConformalCubedSphereGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} partitioned on CPU():
ā”œā”€ā”€ grids: 18Ɨ18Ɨ9 OrthogonalSphericalShellGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} on CPU with 6Ɨ6Ɨ6 halo and with precomputed metrics
ā”œā”€ā”€ partitioning: CubedSpherePartition with (1 region in each panel)
ā”œā”€ā”€ connectivity: CubedSphereConnectivity
└── devices: (CPU(), CPU(), CPU(), CPU(), CPU(), CPU())

julia> model = HydrostaticFreeSurfaceModel(; grid,
                                           momentum_advection = WENOVectorInvariant(order=5),
                                           tracer_advection = WENO(order=5),
                                           free_surface = SplitExplicitFreeSurface(grid; substeps=12),
                                           coriolis = HydrostaticSphericalCoriolis(eltype(grid)),
                                           tracers = :b,
                                           buoyancy = BuoyancyTracer())
HydrostaticFreeSurfaceModel{CPU, MultiRegionGrid}(time = 0 seconds, iteration = 0)
ā”œā”€ā”€ grid: 18Ɨ18Ɨ9 ConformalCubedSphereGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} on CPU with 6Ɨ6Ɨ6 halo
ā”œā”€ā”€ timestepper: QuasiAdamsBashforth2TimeStepper
ā”œā”€ā”€ tracers: b
ā”œā”€ā”€ closure: Nothing
ā”œā”€ā”€ buoyancy: BuoyancyTracer with ĝ = NegativeZDirection()
ā”œā”€ā”€ free surface: SplitExplicitFreeSurface with gravitational acceleration 9.80665 m s⁻²
│   └── substepping: FixedSubstepNumber(8)
ā”œā”€ā”€ advection scheme:
│   ā”œā”€ā”€ momentum: MultiRegionObject{NTuple{6, WENOVectorInvariant{3, 3, Float64, Oceananigans.Advection.OnlySelfUpwinding{Centered{2, Float64, Centered{1, Float64, Nothing}}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.u_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.v_smoothness)}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, Oceananigans.Advection.VelocityStencil, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, Oceananigans.Advection.OnlySelfUpwinding{Centered{2, Float64, Centered{1, Float64, Nothing}}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.u_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.v_smoothness)}}}}, NTuple{6, CPU}, KernelAbstractions.CPU}
│   └── b: WENO{3, Float64, Float32}(order=5)
└── coriolis: HydrostaticSphericalCoriolis{Oceananigans.Advection.EnstrophyConserving{Float64}, Float64}

julia> simulation = Simulation(model, Δt=60, stop_time=600)
Simulation of HydrostaticFreeSurfaceModel{CPU, MultiRegionGrid}(time = 0 seconds, iteration = 0)
ā”œā”€ā”€ Next time step: 1 minute
ā”œā”€ā”€ Elapsed wall time: 0 seconds
ā”œā”€ā”€ Wall time per iteration: NaN days
ā”œā”€ā”€ Stop time: 10 minutes
ā”œā”€ā”€ Stop iteration: Inf
ā”œā”€ā”€ Wall time limit: Inf
ā”œā”€ā”€ Minimum relative step: 0.0
ā”œā”€ā”€ Callbacks: OrderedDict with 4 entries:
│   ā”œā”€ā”€ stop_time_exceeded => 4
│   ā”œā”€ā”€ stop_iteration_exceeded => -
│   ā”œā”€ā”€ wall_time_limit_exceeded => e
│   └── nan_checker => }
ā”œā”€ā”€ Output writers: OrderedDict with no entries
└── Diagnostics: OrderedDict with no entries

julia> simulation.output_writers[:fields] = JLD2Writer(model, fields(model);
                                                       schedule = IterationInterval(2),
                                                       filename = "cubed_sphere_output",
                                                       verbose = false,
                                                       overwrite_existing = true)
JLD2Writer scheduled on IterationInterval(2):
ā”œā”€ā”€ filepath: cubed_sphere_output.jld2
ā”œā”€ā”€ 7 outputs: (u, v, w, b, Ī·, U, V)
ā”œā”€ā”€ array type: Array{Float32}
ā”œā”€ā”€ including: [:grid, :coriolis, :buoyancy, :closure]
ā”œā”€ā”€ file_splitting: NoFileSplitting
└── file size: 1.9 MiB

julia> run!(simulation)
[ Info: Initializing simulation...
ERROR: MethodError: no method matching MultiRegionObject(::NTuple{6, Array{Float32, 3}})

Closest candidates are:
  MultiRegionObject(::KernelAbstractions.Backend, ::Tuple, ::Tuple)
   @ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:25
  MultiRegionObject(::KernelAbstractions.Backend, Any...; devices)
   @ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:18
  MultiRegionObject(::Oceananigans.Architectures.AbstractArchitecture, ::Tuple, ::Tuple)
   @ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:34
  ...

Stacktrace:
  [1] convert_output(mo::MultiRegionObject{…}, writer::JLD2Writer{…})
    @ Oceananigans.MultiRegion ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/MultiRegion/multi_region_output_writers.jl:54
  [2] fetch_and_convert_output(output::Field{…}, model::HydrostaticFreeSurfaceModel{…}, writer::JLD2Writer{…})
    @ Oceananigans.OutputWriters ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/OutputWriters/fetch_output.jl:40
  [3] (::Oceananigans.OutputWriters.var"#36#37"{JLD2Writer{…}, HydrostaticFreeSurfaceModel{…}})(::Tuple{Symbol, Field{…}})
    @ Oceananigans.OutputWriters ./none:0
  [4] iterate
    @ ./generator.jl:47 [inlined]
  [5] merge(a::@NamedTuple{}, itr::Base.Generator{Base.Iterators.Zip{Tuple{…}}, Oceananigans.OutputWriters.var"#36#37"{JLD2Writer{…}, HydrostaticFreeSurfaceModel{…}}})
    @ Base ./namedtuple.jl:360
  [6] NamedTuple
    @ ./namedtuple.jl:151 [inlined]
  [7] macro expansion
    @ ./timing.jl:395 [inlined]
  [8] write_output!(writer::JLD2Writer{…}, model::HydrostaticFreeSurfaceModel{…})
    @ Oceananigans.OutputWriters ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/OutputWriters/jld2_writer.jl:253
  [9] write_output!(writer::JLD2Writer{…}, sim::Simulation{…})
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/simulation.jl:252
 [10] initialize!(sim::Simulation{…})
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:243
 [11] time_step!(sim::Simulation{…})
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:136
 [12] run!(sim::Simulation{…}; pickup::Bool)
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:105
 [13] run!(sim::Simulation{…})
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:92
 [14] top-level scope
    @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.
``

navidcy avatar Jul 06 '25 21:07 navidcy

I added the backend as first arg in the MultiRegionObject constructor; see

https://github.com/michel2323/Oceananigans.jl/blob/05b1d9927275d7ec74fce2f7a7afab2769bc21d5/src/MultiRegion/multi_region_output_writers.jl#L56

Is this the right thing to do?

Now there is a FieldTimeSeries-related error is further down in the test_multi_region_cubed_sphere.jl...

navidcy avatar Jul 06 '25 23:07 navidcy

Providing CPU() to MultiRegionObject constructor seems to do the job...

convert_output(mo::MultiRegionObject, writer) =
    MultiRegionObject(CPU(), Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects))

But is this the right thing to do? Not sure.

navidcy avatar Jul 07 '25 00:07 navidcy

All tests on tartarus pass! I'd like to see the distributed CI pass as well tho...

navidcy avatar Jul 07 '25 01:07 navidcy

Distributed CI passes šŸŽ‰

navidcy avatar Jul 07 '25 03:07 navidcy

Providing CPU() to MultiRegionObject constructor seems to do the job...

convert_output(mo::MultiRegionObject, writer) =
    MultiRegionObject(CPU(), Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects))

But is this the right thing to do? Not sure.

I think the more general form is

 convert_output(mo::MultiRegionObject, writer) =
    MultiRegionObject(
        architecture_from_type(writer.array_type),
        Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects)
    )

I had to add:

architecture_from_type(type::Type{<:AbstractArray}) = architecture(type())

@glwagner Opinions?

michel2323 avatar Jul 07 '25 14:07 michel2323

@navidcy Thank you so much for the review and fixes!

michel2323 avatar Jul 07 '25 14:07 michel2323

hi @michel2323, I doubt that the architecture_from_type method did the job -- the tests still fail.

That's because Array type cannot be instantiated like Array() I believe...

What do we want here? We want it to return the architecture that corresponds to the outer type of the writer.array_type, correct? E.g., if this is Array{Float32} we want CPU() and if it's CuArray{Float64} we want GPU(), etc?

navidcy avatar Jul 07 '25 21:07 navidcy

hi @michel2323, I doubt that the architecture_from_type method did the job -- the tests still fail.

That's because Array type cannot be instantiated like Array() I believe...

What do we want here? We want it to return the architecture that corresponds to the outer type of the writer.array_type, correct? E.g., if this is Array{Float32} we want CPU() and if it's CuArray{Float64} we want GPU(), etc?

ff957c8 attempts to resolve this

navidcy avatar Jul 07 '25 22:07 navidcy

I'm merging this.

navidcy avatar Jul 14 '25 10:07 navidcy

go for it!

simone-silvestri avatar Jul 14 '25 10:07 simone-silvestri