parcels
parcels copied to clipboard
Better ways of storing particle locations for connectivity-related studies?
In many studies related to connectivity (e.g. ecological, marine debris source attribution, etc.) we are less interested in the specific trajectory of the particle and more interested in discrete events, e.g. the [lat, lon, particle-age] when a particle is near the coast. This is usually achieved by outputting all particle locations at some time interval dt and identifying particles that fulfil the event criteria (e.g. being within distance d of the coast).
However, this is problematic because we are missing many of these events (e.g. if we have an output frequency of 1 day, our particle may be moving order 10-100km between outputs so we may be missing the majority of events of interest). It is also wasteful in terms of storage.
One alternative facilitated by oceanparcels is the ability to only save particle variables upon deletion. This solves the above problem if we are actively removing particles (since we can just record the [lat, lon, particle-age] upon removal) but it does not solve the problem if we are interested in multiple events per particle (e.g. tracking which reef sites a particle crosses but not removing them). There are hacky ways that I could imagine possibly getting around this (e.g. having string variables for location and encountered sites, appending to those strings if a site is encountered, and then deleting all particles at the end and saving upon deletion), but this doesn't seem like a good solution!
A better solution might be a modification to the default output method in oceanparcels (i.e. saving particle variables at a regular time interval dt) but rather than saving variables for all particles, variables would only be saved for particles satisfying some kernel condition. In other words, the output netcdf would be a sparse array and since it is sparse, a very high output frequency could be chosen (e.g. approaching the RK4 time-step) and thus, few events would be missed.
My questions are as follow:
- Has this been implemented? (I can't find such a feature but I wanted to check!)
- Does the netcdf file format permit sparse arrays (since obviously this is only practical if the arrays are genuinely sparse, not just storing zeros/nans for particles we are not interested in)
- Is this something that sounds like it could be straightforward to implement in parcels? I'd be happy to do some digging in the code to see if I could try to get it to work, but it'd be good to know how difficult you think this would be.
Thanks for your elaborate and carefully worded question, @nvogtvincent! It's a good point you raise, and indeed not something we support right now in Parcels. I do agree that there are clear use-cases for more fine-grained control when particles are written.
There are a few approaches in which you can write only specific particle locations to file, which I list below. Each has advantages and disadvantages
- You could delete the particle whenever you want to write the position, and immediately spawn a new particle. Note that this is not too trivial (yet), because we still lack a
particle.spawn
method in the Kernel. But you could create new particles outside of thepset.execute()
which you start at the location and time of the deleted particles, so they continue the path. If you add aparent
Variable, then you could keep track of the provenance of each particle - Set
outputdt
equal todt
, and have a script running in parallel with your parcels script that checks and modifies the temporary numpy files in theout-XXXXXX
directory, so that it parses and only keeps those positions you're interested in. Note that this is a bit of a 'hacky' solution (and that creating the temporary numpy files could come at a huge I/O-cost); but could be very effective. - Dig into the code to and tweak when particles are written. The relevant lines are https://github.com/OceanParcels/parcels/blob/master/parcels/collection/collectionsoa.py#L845-L846, but I guess that it will be tricky to adapt these. Ideally we'd one day have a
particle.write
option inside the Kernel (similar toparticle.delete
) but that will be significant work. Would you be up for that?
Thanks @erikvansebille - these are all good ideas (and I agree that particle.spawn and particle.delete methods would be a really useful addition). Regarding the third point, I will have a look - I don't think my competency in python is high enough to do this properly so for now I may have to resort to a hacky solution, but if I have a brainwave I will keep you updated :)
This is now fixed in v3.0.0, following #1147