ClimaAtmos.jl Encapsulate surface conditions setting functionality so it can be reused

This small PR encapsulates some of the functionality of atmos_surface_conditions so that it can be used elsewhere. It also tries to introduce a bit of notation hopefully to make the code a little clearer.

With @LenkaNovak and @akshaysridhar

Mar 28 '24 23:03 glwagner

I have previously received unanimous feedback discouraging the use of "hat" characters like n̂, and 𝒢 is too hard to read even for me. Could you please use the more conventional variable names found in ClimaAtmos, like surface_normal and surface_local_geometry? You can drop the surface_ if you want to be less wordy, but let's try to stay consistent with the rest of ClimaAtmos, ClimaCore, etc.

Mar 29 '24 23:03 dennisYatunin

I guess it is not unanimous anymore!

I can get rid of the hats, but I would like to use math notation for math expressions for sure. Isn't n̂ fairly standard for a surface normal vector? How about just n without the hat?

Mar 30 '24 18:03 glwagner

I got rid of the hats and use G instead of script G. Is it easier to read now? The hats make the math more obvious to me, but I don't need them.

More important than the notation is the name of the functions. scalar_flux is not quite descriptive enough. Perhaps vertical_scalar_flux. We also need a better name for the vector version (plus "vector flux" is a little confusing I think, but I didn't think of a better name yet). Suggestions on those names would be useful.

Mar 30 '24 18:03 glwagner

Totally agree that n̂ is standard! I was quite eager to use it and similar symbols early on, but unfortunately they are not rendered correctly on many devices (including mobile devices and Anna Jaruga's text editor). Here's a screenshot from my phone so you can see what I mean: n_hat_screenshot

For objects of type LocalGeometry, we use the symbols local_geometry or lg in ClimaCore and ClimaAtmos. This distinguishes them from the metric tensor, which we typically denote by gⁱʲ when expressed in the contravariant basis, gᵢⱼ when expressed in the covariant basis, or just g for basis-agnostic functions. Introducing a symbol G to denote a LocalGeometry is somewhat confusing.

For the fluxes, vertical_scalar_flux and vertical_uₕ_flux should be clear enough. However, could you move them to src/utils/utilities.jl, where we define similar utility functions? There are also some helpful functions for extracting and manipulating the metric tensor in that file; you may want to use or extend them here.

Mar 30 '24 20:03 dennisYatunin

Okay, that's fine.

This is not the first time I have heard complaints about Unicode. Nevertheless, it has obvious benefits. So like many choices, there are trade-offs to be considered.

Note that Oceananigans makes use of plenty of Unicode, so purity requirements on the whole code base (eg the coupled modeling system) will not be possible to enforce. Generally speaking, I think it's useful to stay away from hard-to-read or hard-to-type characters in "generic" code that is meant to be read or manipulated by a lot of people, like user scripts. For the dark depths of source code, the trade-offs are different...

For the fluxes, vertical_scalar_flux and vertical_uₕ_flux should be clear enough.

This function isn't specific to velocities though. Here for example it is actually used for momentum. Also I think one should use either language or math notation, but not mix the two.

Mar 30 '24 21:03 glwagner

@dennisYatunin can you take a look at the latest version? I changed the names to hopefully be more "correct" --- though I think tensor_from_components is still not quite right (its not quite so general).

As for moving these functions to another file, I think that's ok. However, because these functions are only used in one place, I feel its not well-motivated to move them because it will make atmos_surface_conditions harder to understand. If we can think of a second place to use them, then I think it will make sense...

Apr 04 '24 19:04 glwagner

Working on this with @LenkaNovak and @juliasloan25

Apr 04 '24 19:04 glwagner

Names seem ok to me, as long as you're only using them in this one file. If you're planning to use them outside of this file (e.g., in ClimaEarth), it might be good to have something less generic. It's hard to do this without being too wordy, but maybe something like c3_flux_from_components and c3xc12_flux_from_components could work, or maybe scalar_flux_from_components and covariant_vector_flux_from_components? Also, you probably want to use n₃ consistently, instead of z and n₁.

Also, why do you need the @inline annotations? I would expect the compiler to generate efficient code here without them.

Apr 05 '24 00:04 dennisYatunin

Also, why do you need the @inline annotations? I would expect the compiler to generate efficient code here without them.

It's our experience that @inline is generally needed for all atomic functions appearing in kernels and can make a substantial difference. The reason isn't quite clear, since supposedly we force inline the body of every kernel, as probably ClimaCore does too. Obviously it doesn't help that much if we only put them here. But they should be everywhere... so we probably want them here too. We also often need to add @inline to encourage type inference (and forgetting an @inline is a usual suspect for failed GPU compilation).

Apr 05 '24 05:04 glwagner

I was wrestling with the fact that these functions are not flux-specific. Ok, we'll think about it a bit more and let you know when it's ready.

Apr 05 '24 05:04 glwagner

But they should be everywhere... so we probably want them here too.

I would recommend staying away from design principles like this. In general, Julia's compiler knows what it's doing, including when it comes to inlining of LLVM code. The generation and loading of code take time, and overuse of @inline can easily lead to a blow-up in both, not necessarily with a corresponding improvement in performance. We've even observed memory leaks / out-of-memory errors during compilation of particularly complicated kernels involving ClimaCore operations, which is probably at least in part due to overly-aggressive inlining within kernel code. Of course, there are places where @inline is needed, but these are usually situations where we are hitting compiler heuristics that require case-by-case approaches. (For example, see UnrolledUtilities.jl, where @inline is combined with disabling of recursion limits and generative loop unrolling to guarantee constant propagation whenever it can be achieved.) However, use of such compiler annotations should be accompanied by concrete test-based justifications when possible.

Apr 05 '24 06:04 dennisYatunin

I agree with your theoretical arguments and principles. It's just an immense amount of work to test the impact of every @inline in a huge code base --- probably infesible. Especially considering that the code is a moving target, since the physics is changing. I'm surprised to hear that it would cause catastrophic issues like memory leaks though...

@charleskawczynski didn't you find some performance benefit when using @inline in Thermodynamics?

We don't need to have this discussion here...

Apr 05 '24 15:04 glwagner

@simone-silvestri can you chime in about @inline? I can't count the number of times where we have had to add a "forgotten" @inline to fix an issue, it numbers in the hundreds. I don't really understand why this kind of problem wouldn't crop up in ClimaCore or ClimaAtmos.

Apr 05 '24 15:04 glwagner

Re the compiler knowing what it's doing: the broad point is that kernel code is not the typical Julia use case, so the default threshold for inlining is different than the threshold we need to apply for kernel code. We can be confident that almost all (or all) of our kernel code should be inlined? So the annotation is conservative. We do want to reduce compile time as much as possible, but we're also willing to pay some extra compile cost compared to tasks like, for example, making plots, to maximize kernel efficiency.

I might be wrong about what KernelAbstractions is doing though, and that's why we need annotations extensively in Oceananigans. So if ClimaCore is specifically designed so that users can omit @inline, then perhaps the annotation is just redundant.

I'm confused about the downsides cited for kernel code (memory leaks, etc). A memory leak may be a bug not directly related to @inline. Also if the cost of inlining code like what's written here is not justified by the added cost of compilation I think that is surprising. It may hint at other underlying issues.

Apr 05 '24 16:04 glwagner

Regarding using @inline: it won't make any difference on the gpu, since we use the always_inline = true kwarg in CUDA's @cuda launch macro (now in all ClimaCore kernels).

For the CPU, it could help. There is a complicated entanglement between @inline and inference, but I've seen it cut both ways: adding @inline can fix inference or make inference fail (it depends on the situation). I will say that I've seen far more cases where @inline fixes inference than ones that break it.

We've even observed memory leaks / out-of-memory errors during compilation of particularly complicated kernels involving ClimaCore operations, which is probably at least in part due to overly-aggressive inlining within kernel code.

I remember running into OOM errors due to aggressive inlining, but not memory leaks. @dennisYatunin, do you remember where this happened? We should open an issue to track if this has happened.

@charleskawczynski didn't you find some performance benefit when using @inline in Thermodynamics?

Yes, but this was one of the places where inlining actually broke inference, but it was likely because we weren't broadcasting types well (https://github.com/CliMA/ClimaCore.jl/pull/1636).

One reason inlining can help is that it can help the compiler to perform common subexpression elimination (CSE), this is particularly helpful if the elimination includes expensive operations or memory reads.

These functions are small, and function call overhead on the CPU is typically cheaper than on the GPU, so I don't think it will have a big impact either way. That said, I don't have a strong preference either way, and I'd rather not put resistance on the choice/preference to use @inline here.

Apr 05 '24 17:04 charleskawczynski

Also, I do agree with @dennisYatunin regarding to the variable names, thanks for updating those @glwagner.

Apr 05 '24 17:04 charleskawczynski

Here's the discussion where Simon claimed that we were seeing memory leaks from GPUCompiler: https://github.com/CliMA/ClimaCore.jl/pull/1462. I've looked into the relevant unit tests with SnoopCompile, and the vast majority of compilation time is spent on codegen (which shows up as empty space in SnoopCompile flame graphs), so I assume that it's at fault here.

Apr 05 '24 17:04 dennisYatunin

Here's the discussion where Simon claimed that we were seeing memory leaks from GPUCompiler...

Ok, yeah, I'm not sure if that was conclusive.

Apr 05 '24 18:04 charleskawczynski

Ok, I'm not fussed to remove the @inline for this PR. I'm glad we had this discussion though.

Oceananigans' also uses always_inline=true:

https://github.com/CliMA/Oceananigans.jl/blob/d89f51c960ce27351d5efa143eab1aba492ee41d/src/Architectures.jl#L47

But despite that the impression has remained that this may not always be strong enough. It's an evolving issue though, I now realize. Perhaps its solved now and the @inline are superfluous (which would be great news, in fact). I'm also not sure the same is supported by other GPU backends, but that's perhaps not a concern for ClimaAtmos since it would likely be an immense amount of work to use other GPUs...

Apr 05 '24 21:04 glwagner

There's a final point about CPU --- I think it's probable that we also want to always inline on the CPU. I'm not sure if there's a way to achieve that without using @inline. So, even if the always_inline feature achieves its purpose for CUDA GPU, its possible that without @inline on all kernel functions, we'll lose performance there (which is balanced by possibly shortening compilation time).

That said, I'm less concerned about CPU performance, provided that it remains useable for prototyping (ie 2x slowdown is probably ok, but 100x is not).

Apr 05 '24 21:04 glwagner

Looking at build_history, there is not a huge change in performance, so do you guys think this is ready to be merged as is? @glwagner to fix the regression errors, I think you can just increase the number in ref_counter.jl by 1 .

Apr 17 '24 17:04 LenkaNovak

ok! Where is ref_counter.jl?

Apr 17 '24 17:04 glwagner

here :)

Apr 17 '24 17:04 LenkaNovak

@charleskawczynski @dennisYatunin does increasing the ref counter make sense here, or is the behavioral change worrying? 🤔

Apr 24 '24 20:04 LenkaNovak

The order of operations in the new computation of ρ_flux_uₕ might be a little different from what it was before, so it's not unreasonable to see an MSE table change due to roundoff error. If you want, you can sanity check that reverting to the old computation of ρ_flux_uₕ gives exactly the same results as before, but what you have looks fine to me.

Apr 24 '24 21:04 dennisYatunin

ClimaAtmos.jl ClimaAtmos.jl copied to clipboard

Encapsulate surface conditions setting functionality so it can be reused

ClimaAtmos.jl
ClimaAtmos.jl copied to clipboard