parcels Refactor particleset attributes and particle data for vectorized kernels

Following from #2122, it would be good to rethink the API and refactor the code for particleset data. See below the original comment by @veckothegecko

Hmm, I assume that this change was needed since we are passing the particleset all the way to the kernels now? I think we should at some point pass the particle data itself down - as I touched on f2f, there is a distinction between the particle data and the ParticleSet (e.g., it doesn't make sense to call pset.execute() in a kernel). By passing the particledata down to the kernel it makes this clear. This also means that we don't silently limit users variable names (currently they can't call their variable names execute or anything that is already a method/attr on ParticleSet).

Happy for that to be a future issue.

Perhaps:

Make ParticleData its own class (dict of arrays under the hood, but allowing .attr notation to get and set values)
- Only have private variables/methods/names on this dataset (i.e., _data). All public attribute access will be assumed to correspond with particle data -> this results in no shadowing
- Disallow names starting with _ in parcels/particle.py@Variable as this could collide with PariticleData
Pass ParticleData down to the kernels
Update pset._data to be pset.data so that its on our public API.
- @erikvansebille can we then remove the __getattr__ and __setattr__ methods on the pset? I don't think its too prohibitive to do pset.data.lon if users want to access their lon values, or would you prefer to still have the pset.lon? (a getattr? or also a setattr?)

Also, thoughts on explicitly disallowing object.__setattr__(self, name, value) in that last else clause? I know that its to maintain the default behaviour for Python objects, but I don't see the benefit in supporting it here - and since it works outside of our model it means that that data would fall through the cracks if we write particle data (and also means that if people typo in their kernel it will be a silent bug)

Originally posted by @VeckoTheGecko in https://github.com/OceanParcels/Parcels/pull/2122#discussion_r2263531738

Aug 13 '25 07:08 erikvansebille

Note that with the change in https://github.com/OceanParcels/Parcels/pull/2122/commits/bee54642db18a0f4cbb08905bf470638527ed754, we now pass a list of KernelParticles to the Kernels, instead of the entire ParticleSet: https://github.com/OceanParcels/Parcels/blob/081753dbce1ea327d1c2ed6fcc147d81183caa10/parcels/kernel.py#L255-L258

So this might solve (some) of the problems above?

Aug 14 '25 15:08 erikvansebille

we now pass a list of KernelParticles to the Kernels, instead of the entire ParticleSet

*not a list of KernelParticle instances, but a KernelParticle instance where the indices is a boolean mask (instead of single indices - 1, 2, 3 etc. - as it was before)

Yes, KernelParticle in #2122 is helping with the problems above. But naming is now a bit confusing as it isn't about a single particle, and would be good to formalise this properly in a particle data class with proper testing. For a future PR - I can work on that once I'm back (also fielding some interesting ideas from people here, which has been helpful)

Aug 19 '25 11:08 VeckoTheGecko

Another aspect to consider is the implementation of __getitem__:

https://github.com/OceanParcels/Parcels/blob/d4fdc01e9913db51dd18e4eb1afcf2e6c8250b02/parcels/particle.py#L141-L143

See also https://github.com/OceanParcels/Parcels/pull/2122#discussion_r2285507966

Aug 19 '25 18:08 erikvansebille

Note that https://github.com/OceanParcels/Parcels/pull/2199#issuecomment-3285145938 now has a minimal breaking example

Sep 12 '25 12:09 erikvansebille

@erikvansebille is this the issue most relevant to the KernelParticle discussion? Also, I feel like there was another issue with test cases (repeatedly assigning to variables and that causing broken state) but I can't seem to find it

Nov 03 '25 09:11 VeckoTheGecko

I feel like there was another issue with test cases (repeatedly assigning to variables and that causing broken state)

Found it https://github.com/Parcels-code/Parcels/pull/2199#issuecomment-3285278876

Nov 03 '25 09:11 VeckoTheGecko

I think this problem runs deeper than a simple refactoring issue. I've have reached out to others to ask what they think - hence I have written this comment here in a way that assumes zero prior knowledge about Parcels internals (or about what's discussed above in this issue).

Parcels is a simulation framework that looks to integrate particle trajectories in flow fields by repeatedly updating their positions in a loop based on "Kernels" (i.e., functions that define this custom behaviour - can be provided by the package or written by users).

When running a simulation, only want to act on particles within simulation. This can be - for example - a timed release where particles are gradually released in the simulation. Here we only want to update a portion of the particle data.

Here is a diagram showing the layout that we have in mind.

Trying to implement this via a custom KernelParticleData class, however, is proving difficult due to array masking in numpy. I'm not really sure how to proceed, or what would be a good pattern to use. I think on a fundamental level we might be running into the difference between views and copies as well.

Here's a minimal example of the code showing the problem. Keen to have ideas on how we can better implement KernelParticleData (if that is possible).

import numpy as np

CURRENT_TIME = np.datetime64("2024-01-05")  # clock time of simulation - constant just for this example
N = 11
N_IN_SIMULATION = 5


def get_particle_data():
    data = {
        "lon": np.linspace(900.0, 1000.0, N),
        "lat": np.linspace(900.0, 1000.0, N),
        "depth": np.linspace(900.0, 1000.0, N),
        "time": np.concat(
            [
                np.repeat(CURRENT_TIME, N_IN_SIMULATION).astype("datetime64[ns]"),
                np.array([np.datetime64(f"2024-01-{d:02}") for d in range(6, 12)]).astype("datetime64[ns]"),
            ]
        ),
    }
    assert all(len(v) == N for v in data.values())
    return data


class KernelParticleData:
    """Simple class to be used in a kernel that links a particle (on the kernel level) to a particle dataset."""

    def __init__(self, data, index):
        self._data = data
        self._index = index

    def __getattr__(self, name):
        return self._data[name][self._index]

    def __setattr__(self, name, value):
        if name in ["_data", "_index"]:
            object.__setattr__(self, name, value)
        else:
            self._data[name][self._index] = value

    def __getitem__(self, index):
        self._index = index
        return self

    def __len__(self):
        return len(self._index)


# =======
# Kernels
# =======
# Behaviour defined by the user to be run on particles that are in the simulation
# `fieldset` provides the flow data - not used in these examples


def kernel1(particles, fieldset):
    assert particles.lon.size == N_IN_SIMULATION
    particles.lon = 0


def kernel2(particles, fieldset):
    assert particles.lon.size == N_IN_SIMULATION
    particles.lon = 0

    # is there a way that I can define subsets of particles, and then update only those subsets?
    # the following code would be great, but I don't know how to support on the back end :/
    psubset = particles[particles.lon > 950.0]
    assert psubset.lon.size == 0


def main():
    # pset = ParticleSet(fieldset, lon=[0.5], lat=[0.5])
    particle_data = get_particle_data()
    mask_to_evaluate = particle_data["time"] == CURRENT_TIME  # evaluate particles whos clock is the current time

    kernel_particle_data = KernelParticleData(particle_data, mask_to_evaluate)

    kernel1(kernel_particle_data, fieldset=None)
    assert np.all(particle_data["lon"][:N_IN_SIMULATION] == 0)
    assert np.all(particle_data["lon"][N_IN_SIMULATION:] >= 900.0)
    print("kernel1 works!")

    # Reset
    particle_data = get_particle_data()
    kernel_particle_data = KernelParticleData(particle_data, mask_to_evaluate)

    try:
        kernel2(kernel_particle_data, fieldset=None)
    except Exception as e:
        print("kernel2 doesn't work!")
        raise e

main()

Nov 17 '25 09:11 VeckoTheGecko

OK, so I dived deep into this KernelParticle Issue, and I think I fixed it. The minimal example above is actually quite easy to fix:

     def __getitem__(self, index):
-        self._index = index
-        return self
+        new_index = np.zeros_like(self._index, dtype=bool)
+        return KernelParticleData(self._data, new_index)

This will return an object with index only where the original _index was True and the new index is True, thereby subsetting the index as desired!

However, also being able to do operations like particles[mask] += val turned out to be much more complex. It required building an entire new DataClass. See #2443 for the implementation

Dec 23 '25 07:12 erikvansebille