ClimaAtmos.jl
ClimaAtmos.jl copied to clipboard
Encapsulate surface conditions setting functionality so it can be reused
This small PR encapsulates some of the functionality of atmos_surface_conditions
so that it can be used elsewhere. It also tries to introduce a bit of notation hopefully to make the code a little clearer.
With @LenkaNovak and @akshaysridhar
I have previously received unanimous feedback discouraging the use of "hat" characters like n̂
, and 𝒢
is too hard to read even for me. Could you please use the more conventional variable names found in ClimaAtmos, like surface_normal
and surface_local_geometry
? You can drop the surface_
if you want to be less wordy, but let's try to stay consistent with the rest of ClimaAtmos, ClimaCore, etc.
I guess it is not unanimous anymore!
I can get rid of the hats, but I would like to use math notation for math expressions for sure. Isn't n̂
fairly standard for a surface normal vector? How about just n
without the hat?
I got rid of the hats and use G
instead of script G. Is it easier to read now? The hats make the math more obvious to me, but I don't need them.
More important than the notation is the name of the functions. scalar_flux
is not quite descriptive enough. Perhaps vertical_scalar_flux
. We also need a better name for the vector version (plus "vector flux" is a little confusing I think, but I didn't think of a better name yet). Suggestions on those names would be useful.
Totally agree that n̂
is standard! I was quite eager to use it and similar symbols early on, but unfortunately they are not rendered correctly on many devices (including mobile devices and Anna Jaruga's text editor). Here's a screenshot from my phone so you can see what I mean:
For objects of type LocalGeometry
, we use the symbols local_geometry
or lg
in ClimaCore and ClimaAtmos. This distinguishes them from the metric tensor, which we typically denote by gⁱʲ
when expressed in the contravariant basis, gᵢⱼ
when expressed in the covariant basis, or just g
for basis-agnostic functions. Introducing a symbol G
to denote a LocalGeometry
is somewhat confusing.
For the fluxes, vertical_scalar_flux
and vertical_uₕ_flux
should be clear enough. However, could you move them to src/utils/utilities.jl
, where we define similar utility functions? There are also some helpful functions for extracting and manipulating the metric tensor in that file; you may want to use or extend them here.
Okay, that's fine.
This is not the first time I have heard complaints about Unicode. Nevertheless, it has obvious benefits. So like many choices, there are trade-offs to be considered.
Note that Oceananigans makes use of plenty of Unicode, so purity requirements on the whole code base (eg the coupled modeling system) will not be possible to enforce. Generally speaking, I think it's useful to stay away from hard-to-read or hard-to-type characters in "generic" code that is meant to be read or manipulated by a lot of people, like user scripts. For the dark depths of source code, the trade-offs are different...
For the fluxes, vertical_scalar_flux and vertical_uₕ_flux should be clear enough.
This function isn't specific to velocities though. Here for example it is actually used for momentum. Also I think one should use either language or math notation, but not mix the two.
@dennisYatunin can you take a look at the latest version? I changed the names to hopefully be more "correct" --- though I think tensor_from_components
is still not quite right (its not quite so general).
As for moving these functions to another file, I think that's ok. However, because these functions are only used in one place, I feel its not well-motivated to move them because it will make atmos_surface_conditions
harder to understand. If we can think of a second place to use them, then I think it will make sense...
Working on this with @LenkaNovak and @juliasloan25
Names seem ok to me, as long as you're only using them in this one file. If you're planning to use them outside of this file (e.g., in ClimaEarth), it might be good to have something less generic. It's hard to do this without being too wordy, but maybe something like c3_flux_from_components
and c3xc12_flux_from_components
could work, or maybe scalar_flux_from_components
and covariant_vector_flux_from_components
? Also, you probably want to use n₃
consistently, instead of z
and n₁
.
Also, why do you need the @inline
annotations? I would expect the compiler to generate efficient code here without them.
Also, why do you need the @inline annotations? I would expect the compiler to generate efficient code here without them.
It's our experience that @inline
is generally needed for all atomic functions appearing in kernels and can make a substantial difference. The reason isn't quite clear, since supposedly we force inline the body of every kernel, as probably ClimaCore does too. Obviously it doesn't help that much if we only put them here. But they should be everywhere... so we probably want them here too. We also often need to add @inline
to encourage type inference (and forgetting an @inline
is a usual suspect for failed GPU compilation).
I was wrestling with the fact that these functions are not flux-specific. Ok, we'll think about it a bit more and let you know when it's ready.
But they should be everywhere... so we probably want them here too.
I would recommend staying away from design principles like this. In general, Julia's compiler knows what it's doing, including when it comes to inlining of LLVM code. The generation and loading of code take time, and overuse of @inline
can easily lead to a blow-up in both, not necessarily with a corresponding improvement in performance. We've even observed memory leaks / out-of-memory errors during compilation of particularly complicated kernels involving ClimaCore operations, which is probably at least in part due to overly-aggressive inlining within kernel code. Of course, there are places where @inline
is needed, but these are usually situations where we are hitting compiler heuristics that require case-by-case approaches. (For example, see UnrolledUtilities.jl, where @inline
is combined with disabling of recursion limits and generative loop unrolling to guarantee constant propagation whenever it can be achieved.) However, use of such compiler annotations should be accompanied by concrete test-based justifications when possible.
I agree with your theoretical arguments and principles. It's just an immense amount of work to test the impact of every @inline
in a huge code base --- probably infesible. Especially considering that the code is a moving target, since the physics is changing. I'm surprised to hear that it would cause catastrophic issues like memory leaks though...
@charleskawczynski didn't you find some performance benefit when using @inline
in Thermodynamics?
We don't need to have this discussion here...
@simone-silvestri can you chime in about @inline
? I can't count the number of times where we have had to add a "forgotten" @inline
to fix an issue, it numbers in the hundreds. I don't really understand why this kind of problem wouldn't crop up in ClimaCore or ClimaAtmos.
Re the compiler knowing what it's doing: the broad point is that kernel code is not the typical Julia use case, so the default threshold for inlining is different than the threshold we need to apply for kernel code. We can be confident that almost all (or all) of our kernel code should be inlined? So the annotation is conservative. We do want to reduce compile time as much as possible, but we're also willing to pay some extra compile cost compared to tasks like, for example, making plots, to maximize kernel efficiency.
I might be wrong about what KernelAbstractions is doing though, and that's why we need annotations extensively in Oceananigans. So if ClimaCore is specifically designed so that users can omit @inline, then perhaps the annotation is just redundant.
I'm confused about the downsides cited for kernel code (memory leaks, etc). A memory leak may be a bug not directly related to @inline. Also if the cost of inlining code like what's written here is not justified by the added cost of compilation I think that is surprising. It may hint at other underlying issues.
Regarding using @inline
: it won't make any difference on the gpu, since we use the always_inline = true
kwarg in CUDA's @cuda
launch macro (now in all ClimaCore kernels).
For the CPU, it could help. There is a complicated entanglement between @inline
and inference, but I've seen it cut both ways: adding @inline
can fix inference or make inference fail (it depends on the situation). I will say that I've seen far more cases where @inline
fixes inference than ones that break it.
We've even observed memory leaks / out-of-memory errors during compilation of particularly complicated kernels involving ClimaCore operations, which is probably at least in part due to overly-aggressive inlining within kernel code.
I remember running into OOM errors due to aggressive inlining, but not memory leaks. @dennisYatunin, do you remember where this happened? We should open an issue to track if this has happened.
@charleskawczynski didn't you find some performance benefit when using @inline in Thermodynamics?
Yes, but this was one of the places where inlining actually broke inference, but it was likely because we weren't broadcasting types well (https://github.com/CliMA/ClimaCore.jl/pull/1636).
One reason inlining can help is that it can help the compiler to perform common subexpression elimination (CSE), this is particularly helpful if the elimination includes expensive operations or memory reads.
These functions are small, and function call overhead on the CPU is typically cheaper than on the GPU, so I don't think it will have a big impact either way. That said, I don't have a strong preference either way, and I'd rather not put resistance on the choice/preference to use @inline
here.
Also, I do agree with @dennisYatunin regarding to the variable names, thanks for updating those @glwagner.
Here's the discussion where Simon claimed that we were seeing memory leaks from GPUCompiler: https://github.com/CliMA/ClimaCore.jl/pull/1462. I've looked into the relevant unit tests with SnoopCompile, and the vast majority of compilation time is spent on codegen (which shows up as empty space in SnoopCompile flame graphs), so I assume that it's at fault here.
Here's the discussion where Simon claimed that we were seeing memory leaks from GPUCompiler...
Ok, yeah, I'm not sure if that was conclusive.
Ok, I'm not fussed to remove the @inline
for this PR. I'm glad we had this discussion though.
Oceananigans' also uses always_inline=true
:
https://github.com/CliMA/Oceananigans.jl/blob/d89f51c960ce27351d5efa143eab1aba492ee41d/src/Architectures.jl#L47
But despite that the impression has remained that this may not always be strong enough. It's an evolving issue though, I now realize. Perhaps its solved now and the @inline
are superfluous (which would be great news, in fact). I'm also not sure the same is supported by other GPU backends, but that's perhaps not a concern for ClimaAtmos since it would likely be an immense amount of work to use other GPUs...
There's a final point about CPU --- I think it's probable that we also want to always inline on the CPU. I'm not sure if there's a way to achieve that without using @inline
. So, even if the always_inline
feature achieves its purpose for CUDA GPU, its possible that without @inline
on all kernel functions, we'll lose performance there (which is balanced by possibly shortening compilation time).
That said, I'm less concerned about CPU performance, provided that it remains useable for prototyping (ie 2x slowdown is probably ok, but 100x is not).
Looking at build_history, there is not a huge change in performance, so do you guys think this is ready to be merged as is? @glwagner to fix the regression errors, I think you can just increase the number in ref_counter.jl
by 1 .
ok! Where is ref_counter.jl
?
here :)
@charleskawczynski @dennisYatunin does increasing the ref counter make sense here, or is the behavioral change worrying? 🤔
The order of operations in the new computation of ρ_flux_uₕ
might be a little different from what it was before, so it's not unreasonable to see an MSE table change due to roundoff error. If you want, you can sanity check that reverting to the old computation of ρ_flux_uₕ
gives exactly the same results as before, but what you have looks fine to me.