Oceananigans.jl
Oceananigans.jl copied to clipboard
CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS) when using Lagrangian particles under large CFL number
In the example below, the model crashes reporting a GPU illegal memory access error. The CFL number is intentionally set to a large value, under which the model will encounter numerical instability. I expect this model should abort itself when NANs appear instead of crashing due to a memory illegal access error. Besides, this only happens when I use Lagrangian particles. If not, the model will terminate by itself as I expect. I have also verified that the model does not crash when the CFL number is small.
using Oceananigans
const Lx = 1.0
const Nx = 50
const Δx = Lx / Nx
const max_velocity = 1.0
const cfl = 10.0
const Δt = cfl * Δx / max_velocity
function initial_u(x::R, y::R, z::R) where {R<:Real}
return (max_velocity / Lx) * y
end
grid = RectilinearGrid(
GPU(),
size = (Nx, Nx, Nx),
x = (0.0, Lx),
y = (0.0, Lx),
z = (0.0, Lx),
topology = (Periodic, Bounded, Bounded)
)
arch_array = Oceananigans.Architectures.array_type(GPU()){Float64}
n_particles = 1000
xs = convert(arch_array, zeros((n_particles, )))
ys = convert(arch_array, LinRange(0.0, Lx, n_particles))
zs = convert(arch_array, zeros((n_particles, )))
particles = LagrangianParticles(x = xs, y = ys, z = zs)
model = NonhydrostaticModel(;
grid,
particles = particles,
)
set!(model, u = initial_u)
simulation = Simulation(model; Δt = Δt, stop_iteration = 200)
run!(simulation)
The output.log is uploaded as a file.
Test environment:
- Julia version: v1.9.3
- Oceananigans: v0.89.0
- Tested on Ubuntu 20.04.6 LTS with CUDA 12.0 and MIT Satori with CUDA 11.4
This example tries to reproduce some of my simulations for convection. In these simulation, I used strong heating, and therefore I expect some of them to crash. However, I did not expect that they would trigger GPU illegal memory access errors.
This issue is probably related to #3267.