Oceananigans.jl Active particles with tracer interactions

Hi all,

I am currently working on a biogeochemistry modelling environment with @johnryantaylor using Oceananigans, and as part of this have have up with a scheme to have "active" particles that interact with tracers.

How I currently have this set up: when the particle dynamics are run, the particles can increase/decrease the concentration of tracers in the cells surrounding them, but this only allows for explicit Euler integration. A better way todo this seems to be to have the particles uptake/exudation of tracers contribute to their tendencies during the time stepping as I have implemented here.

I'm unsure if this is implemented in the best way throughout but I'm fairly satisfied that I have it working as desired.

Hope everyone is happy with this?

To summaries what I have changed:

Particle setup: particles can get a parameter called active_properties which is a tuple of named tuples of particle properties and tracers (e.g. ((property=:t, tracer=:x),).) The idea being that the particle dynamics function (as already implemented) changes the particle property to set a rate of uptake/exudation of a tracer, and the below function would integrate this change to the specified tracer.
calculate_particle_tendency_contributions! added after each calculate_tendencies! call: function goes through each particle, finds its 8 nearest cells, and adds the relevant fraction of each particle property (divided by the cell volume) to the tracer tendencies so it can be integrated by the time stepper (like tracer forcing)

Sep 04 '22 10:09 jagoosw

The test that’s failed says on buildkite that the client was lost and there’s no errors from Julia, I’m guessing if I can get them to rerun it might be fine since the other GPU tasks ran fine?

Sep 05 '22 14:09 jagoosw

I can try rerunning it!

Sep 05 '22 14:09 simone-silvestri

Thank you!

Sep 05 '22 14:09 jagoosw

It failed again so I ran it on our cluster which errored in the same way just saying killed. It's failing on my new test and I think I've probably not implemented the calculate_particle_tendency_contributions! in a GPU friendly way so will try and resolve this tomorrow.

Edit: as I closed the terminal some error about the GPU node being out of memory flashed up which may be relevant.

Sep 05 '22 20:09 jagoosw

I've looked into it more now and the memory usage blows up on line 13 or 14 of calculate_particle_tendency_contributions.jl

To try and more closely copy other similar things I've tried changing 10 and 11 to be

property = particles.properties[getindex(field.property, particles.properties)]
tendency = model.timestepper.Gⁿ[getindex(field.tracer, model.timestepper.Gⁿ)]

with @inline getindex(value::Symbol, fields) = findfirst(keys(fields) .== value) but the same thing happens.

I also checked and the memory usage isn't high on CPU. I don't have much experience with GPUs so am not really sure how to debug this so would really appreciate any advice.

Sep 06 '22 17:09 jagoosw

Particle setup: particles can get a parameter called active_properties which is a tuple of named tuples of particle properties and tracers (e.g. ((property=:t, tracer=:x),).) The idea being that the particle dynamics function (as already implemented) changes the particle property to set a rate of uptake/exudation of a tracer, and the below function would integrate this change to the specified tracer.

Can you please restate this? I think a code example would be helpful with an explanation on what's being accomplished. Also can you explain why this can't be implemented with a Callback (or forcing function)? I.e., why do we desire this as a source code feature? Just to be clear, I'm not saying it should be one way or another, but it's good to have explicit justification for source code features (which are expensive to maintain and require resources to test)!

Sep 06 '22 18:09 glwagner

Can you please restate this? I think a code example would be helpful with an explanation on what's being accomplished. Also can you explain why this can't be implemented with a Callback (or forcing function)? I.e., why do we desire this as a source code feature? Just to be clear, I'm not saying it should be one way or another, but it's good to have explicit justification for source code features (which are expensive to maintain and require resources to test)!

No problem at all, heres an example (not sure where best to put this/if you'd want it in the examples folder). This is about the simplest version I could think of where we have a particle randomly walking around converting tracer a into tracer b. The specific use case we've been using this for is modelling kelp fronds as particles which grow (variety of particle properties change) depending on how much nutrients the particles uptake from a biogeochemical model, and also release tracer back into the model. I also envisage using this to exert drag on the flow at some point.

test

Although I've had this working as custom dynamics of the particles the effect of the particles tendency can only be integrated with explicit Euler, i.e. at each substep it just does tracer[i, j, k] += value*Δt, where as this solution allows it to be properly integrated along with the other tendencies. I couldn't think of a way to implement this as a forcing function before, but perhaps I could use a callback to update an auxiliary field with the tendencies of the particles, and then add this as a forcing function, although I imagine that could use a lot more memory if there were a large grid and small amount of particles.

(I fix the tests that's failed until we've discussed the other changes since its only failed because I forgot to change getmask to get_mask in the test script)

Sep 07 '22 06:09 jagoosw

Can you please restate this? I think a code example would be helpful with an explanation on what's being accomplished. Also can you explain why this can't be implemented with a Callback (or forcing function)? I.e., why do we desire this as a source code feature? Just to be clear, I'm not saying it should be one way or another, but it's good to have explicit justification for source code features (which are expensive to maintain and require resources to test)!

No problem at all, heres an example (not sure where best to put this/if you'd want it in the examples folder). This is about the simplest version I could think of where we have a particle randomly walking around converting tracer a into tracer b. The specific use case we've been using this for is modelling kelp fronds as particles which grow (variety of particle properties change) depending on how much nutrients the particles uptake from a biogeochemical model, and also release tracer back into the model. I also envisage using this to exert drag on the flow at some point.

Although I've had this working as custom dynamics of the particles the effect of the particles tendency can only be integrated with explicit Euler, i.e. at each substep it just does tracer[i, j, k] += value*Δt, where as this solution allows it to be properly integrated along with the other tendencies. I couldn't think of a way to implement this as a forcing function before, but perhaps I could use a callback to update an auxiliary field with the tendencies of the particles, and then add this as a forcing function, although I imagine that could use a lot more memory if there were a large grid and small amount of particles.

(I fix the tests that's failed until we've discussed the other changes since its only failed because I forgot to change getmask to get_mask in the test script)

Nice! Can you paste a more minimal example directly in this PR too so it's preserved here for the future?

Sep 07 '22 16:09 glwagner

(not sure where best to put this/if you'd want it in the examples folder)

The right place would be validation/, as we only use examples/ for features that are widely-used and well-tested (examples are expensive, because they have to run during CI).

Sep 07 '22 16:09 glwagner

Is there a way for me to cancel tests so they don't have to run every commit?

Sep 07 '22 16:09 jagoosw

Didn't mean to commit those Project.toml and Manifest changes, will revert now

Sep 07 '22 19:09 jagoosw

Is there a way for me to cancel tests so they don't have to run every commit?

You can include [skip ci] in the commit message: https://docs.github.com/en/actions/managing-workflow-runs/skipping-workflow-runs

Sep 08 '22 15:09 glwagner

I've spent some time trying to diagnose the GPU issue and it's quite strange, when I copy the test line by line into a REPL it doesn't cause the memory issue and runs fine. After doing that (so everything has been compiled etc.) I then ran the test and the memory usage is fine until the last test, then crashes again. I will investigate more...

Sep 14 '22 18:09 jagoosw

So I think that the tests generally are very close to the memory limit (which I think is around 1GB, not sure how that's being set). It looks like the tracer sinking isn't actually using much more memory but its enough to push it over. If I remove the output writing and speed field tracking (which don't get tested in this run anyway) from the final test it seems to keep the memory usage lower.

I can't test it myself on a GPU right now so am going to push and hopefully it'll work!

Sep 15 '22 15:09 jagoosw

So I think that the tests generally are very close to the memory limit (which I think is around 1GB, not sure how that's being set). It looks like the tracer sinking isn't actually using much more memory but its enough to push it over. If I remove the output writing and speed field tracking (which don't get tested in this run anyway) from the final test it seems to keep the memory usage lower.

I can't test it myself on a GPU right now so am going to push and hopefully it'll work!

Interesting. I don't believe we make any attempt to manage GPU memory. However, the tests are all quite lightweight (the largest are probably the regression tests, at 16^3?) The GPU we use for CI has 24 GB total memory. Up to 16 jobs can run simultaneously. I'm not sure this is consistent with a test being just over the limit, because this would mean the tests would intermittently pass, right? It's slightly stochastic how many CI jobs are running simultaneously.

Sep 15 '22 15:09 glwagner

So I think that the tests generally are very close to the memory limit (which I think is around 1GB, not sure how that's being set). It looks like the tracer sinking isn't actually using much more memory but its enough to push it over. If I remove the output writing and speed field tracking (which don't get tested in this run anyway) from the final test it seems to keep the memory usage lower. I can't test it myself on a GPU right now so am going to push and hopefully it'll work!

Interesting. I don't believe we make any attempt to manage GPU memory. However, the tests are all quite lightweight (the largest are probably the regression tests, at 16^3?) The GPU we use for CI has 24 GB total memory. Up to 16 jobs can run simultaneously. I'm not sure this is consistent with a test being just over the limit, because this would mean the tests would intermittently pass, right? It's slightly stochastic how many CI jobs are running simultaneously.

I'm not really sure where this limit is coming from because I get the same behaviour on our GPU which is on a node with 40GB of memory, and it consistently fails when the memory usage of the process gets to fractionally over 1GB. I think the spike in memory usage is occurring on the line Gp_kernel! = calculate_particle_tendency_kernel!(device(arch), workgroup, worksize) which I wouldn't have thought would allocate much memory?

Sep 15 '22 16:09 jagoosw

Why does the test allocate so much memory? Is it possible to design an inexpensive test? It's important that CI is absolutely as cheap as possible.

Sep 15 '22 17:09 glwagner

I couldn't think of a way to implement this as a forcing function before, but perhaps I could use a callback to update an auxiliary field with the tendencies of the particles, and then add this as a forcing function, although I imagine that could use a lot more memory if there were a large grid and small amount of particles.

I'm still wondering about this solution, which may both be more general (since users can effect arbitrary changes to tracer tendencies, not just source/sink terms) and is less heavy in the source code. Right now, this PR makes some significant source code changes for what I think is a rather narrow application, which is disproportional compared to most of the other features we have I think.

although I imagine that could use a lot more memory if there were a large grid and small amount of particles.

It seems the memory requirement of the forcing function approach is proportional to the number of tracers, not the number of particles --- right? We would nee an "auxiliary tendency" for every tracer; the memory requirement doesn't change based on the number of particles.

If for some reason we cannot use a forcing function approach, we'll have to think carefully about how to design a flexible user API for "particle-induced tracer forcing". I think if we are going to add this then it should be more general than just a source/sink calculation --- ie, we should attempt to have a small source code addition that makes a wide range of new behavior possible, if we can.

Sep 15 '22 17:09 glwagner

I'm still wondering about this solution, which may both be more general (since users can effect arbitrary changes to tracer tendencies, not just source/sink terms) and is less heavy in the source code. Right now, this PR makes some significant source code changes for what I think is a rather narrow application, which is disproportional compared to most of the other features we have I think.

If for some reason we cannot use a forcing function approach, we'll have to think carefully about how to design a flexible user API for "particle-induced tracer forcing". I think if we are going to add this then it should be more general than just a source/sink calculation --- ie, we should attempt to have a small source code addition that makes a wide range of new behavior possible, if we can.

I've reconsidered this now and think we may not even need any source code change to properly implement this behaviour and integrate some particle induced tracer forcing.

If we have a model with some tracers (:A, :B) which we want to force with some particles, we can define an auxiliary field G\_p=TracerFields((:A, :B), grid). We can then have a particle dynamics function that modifies model.auxiliary_fields.G\_p, for example sets the points surrounding the particles to -A[i, j, k] for A, and +A[i, j, k] for B like in my example above. You can then define a discrete forcing function for each of the tracers like a_forcing(i, j, k, grid, clock, model_fields) = model_fields.G\_p.A[i, j, k] etc. (this would only work with my other PR #2733 added so that the auxiliary_fields are available).

It seems the memory requirement of the forcing function approach is proportional to the number of tracers, not the number of particles --- right? We would nee an "auxiliary tendency" for every tracer; the memory requirement doesn't change based on the number of particles.

My reservation with this was that if there's e.g. 1 particle effecting loads of tracers you have to store lots more fields, but I see now that this is much less of an issue.

Sorry about making this much more complicated than it should have been!

Edit: updated the gist example with how this can be implemented (requires PR #2733 branch)

Sep 16 '22 14:09 jagoosw

I think we do need the change I made to the tracked fields so that tracers and auxiliary fields can be tracked properly although I don't like the (field_name=nothing, )setup as I currently have it. Would it be okay for me to change it to a tuple of either model field names, or computed fields, e.g. (:A, :u, speed) where speed = Field(√(u*u + v*v + w*w))? And should I make a new PR with this change in it?

Sep 16 '22 15:09 jagoosw

No worries @jagoosw ! This is a neat application!

It seems maybe that the function get_mask (or nearest_cell_center? If user facing we should come up with a good name) may nevertheless be a crucial part of any implementation, right?

I suspect we still may want some source code features to support this application, but we may not need to add a new routine to modify the tendencies?

Sep 20 '22 05:09 glwagner

Thanks!

It seems maybe that the function get_mask (or nearest_cell_center? If user facing we should come up with a good name) may nevertheless be a crucial part of any implementation, right?

I suspect we still may want some source code features to support this application, but we may not need to add a new routine to modify the tendencies?

I suppose it might be useful to provide some functions like that to make it easier to implement. Possibly also the kernel function I put in the example gist and a wrapper like force_nearest(particle_properties, tendency_field, particles, grid) so there's an easy interface for people to add it to their particle dynamics function?

Sep 20 '22 12:09 jagoosw

Can we start by refactoring this PR to implement this functionality in an example script? It could perhaps go in validation/biogeochemistry.

Once we have that cleaned up, we might be able to see what parts of it belong in the source code versus left to users?

Sep 20 '22 20:09 glwagner

Sure, I'll sort that out later

Sep 21 '22 07:09 jagoosw

Sorry this took so long, I've put the example together and moved some functions to a different file in the source code. Do you think these are the appropriate functions to keep since they can fairly generally be used to make an infinitesimal Lagrangian particle change the tracers?

There is a small issue with how I've done it that you still need to externally define the forcing functions, auxiliary tendency fields, and reset the tendency field to zero in the particle dynamic function. This will be solved with the #2772 though.

Oct 06 '22 16:10 jagoosw

Oceananigans.jl Oceananigans.jl copied to clipboard

Active particles with tracer interactions

Oceananigans.jl
Oceananigans.jl copied to clipboard