Agents.jl Integration with CellListMap.jl

I just became aware of this package:

https://github.com/m3g/CellListMap.jl

Its goal is to accelerate computations of forces, or any kind of interactions, that occur between pairs of "agents" in 2D or 3D continuous space. Simply put, it does both what interacting_pairs currently does, and also computes the forces, which a user would have to do by hand after getting the interacting pairs. Performance benchmarks also look like its better than competing alternatives (common story of Julia packages it would seem).

I think the best place to start integrating this package with Agents.jl is to write an integration example that showcases using it to accelerate computations occurring in an ABM. From there we can see what can be made even more general and integrated more formally into the source of Agents.jl.

Jul 24 '22 14:07 Datseris

Hi. I have a simple code to mimic run & tumble dynamics of bacteria in porous media (implemented for an unpublished researh project). CellListMap is used to find collisions between agents in the ABM model and an array of random fixed obstacles. Bacteria are modeled as sphero-cylinders, that adds a bit complexity to the code to find real distances and calculate torques. Would it be a good integration example (checkout the video)?

I can also simplify the code, to use disks instead of cylinders. Then it would be much simpler, similar to the SIR example with addition of random obstacles (another research project of mine).

Maybe @lmiq has a better suggestion for the integration example.

https://user-images.githubusercontent.com/26183665/180777800-2e1452a1-7240-4542-8917-1b760ab6c859.mp4

Jul 25 '22 12:07 ehsanirani

I was looking at the Agents examples, and found that probably the Flocking example of birds is the closer to the typical application of CellListMap.

Yet, as I understand, the function agend_step applies to one bird. To effectively use a neighbor list (or function to compute the interactions directly), the way I'm used to see simulations is to have a "double"-loop over the elements (the birds), to compute the interactions between neighboring birds. That double loop can be replaced by a single call to the map_pairwise! function from CellListMap.

That loop, is however, sort of implicit in the way Agents is implemented, as the first loop over agents is hidden (at least in that interface), and the function that is called for each agents computes the neighbors.

Thus, I think that I need to interact to something more internal, or access an API which is not apparent at first sight. Any help is appreciated.

edit: it seems that model_step! is what I'm searching for here.

Jul 25 '22 18:07 lmiq

A typical simulation written on top of CellListMap is here, if anyone wants to take a look: https://github.com/m3g/2021_FortranCon/blob/main/celllistmap_vs_namd/simulate.jl

(these are molecular simulations - it would be nice to see how Agents could be used for such applications as well).

Jul 25 '22 18:07 lmiq

@ehsanirani If you can share your running code (even if privately), I would like to see how do you integrate it with Agents. Mostly what I don't see is how to call a function once per step, and use the output of that in the update of the properties of every agent.

Jul 25 '22 23:07 lmiq

@ehsanirani If you can share your running code (even if privately), I would like to see how do you integrate it with Agents. Mostly what I don't see is how to call a function once per step, and use the output of that in the update of the properties of every agent.

I created a minimal repository on codeberg, please check it out:

https://codeberg.org/ehsan/test-Agents-CellListMap

edit: it seems that model_step! is what I'm searching for here.

You are right, you need simething like model_step! function for using CellListMap.

Jul 26 '22 08:07 ehsanirani

That is a really nice application. I think as a an example it is better to make it simpler. But that is a perfect example. I will try to simplify it as much as possible, and put it as an integration example on the CellListMap docs as well. After trying to simplify it, I will post it here. Thank you very much, that is a very nice contribution to all, I think.

Jul 26 '22 16:07 lmiq

I think as a an example it is better to make it simpler.

I totally agree. I think most of the complexity comes from the fact that my agents (bacteria) are modeled as spherocylinders. Maybe assuming agents with simpler shape (disks?) helps with simplifying the code.

Jul 26 '22 19:07 ehsanirani

Hello both, thanks so much for all the input and sorry for joining late!

Yes, I agree with the suggestion of simplifying the rods to spheres. The added complexity of rods does not really affect the integration of the two libraries (if I have understood things correctly, the "rods" are in fact a bunch of spheres that are "linked" to each other, so in the end one still calculates distances between spheres).

Although I must admit that the video with the roods looks so much cooler ;) . One thing we can do, provided that @ehsanirani is okay with this, is show the full integration with spheres only. Then, at the end, provide a comment like "one can create rods or other shapes from spheres, and the above would work in a similar manner. We do this in a different file, which produces this output:" and then show the video. I'm suggesting this because the code that makes rods out of spheres already exists and already works and is quite intelligent, so why not show it anyways?

@lmiq let me know if there are questions about using Agents.jl. Yes, model_step! is the way to go. That's also the way we do it here, which is a code that can completely be replaced by using CellListMap to make things fast: https://juliadynamics.github.io/AgentsExampleZoo.jl/dev/examples/social_distancing/

Jul 27 '22 10:07 Datseris

Depend on how hard or easy the integration example turns out to be, I'll consider whether it is useful to create some extra fields in the ContinuousSpace struct in Agents.jl, so that it out-of-the box has the things expected by CellListMap already set up (there will be a boolean flag whether to enable this so the user can choose).

Jul 27 '22 10:07 Datseris

Oh wow, looking at the example here https://codeberg.org/ehsan/test-Agents-CellListMap/src/branch/main/src/bacteria.jl it seems SUPER straightforward to couple the two packages. As expected of Julia of course.

Jul 27 '22 11:07 Datseris

(if I have understood things correctly, the "rods" are in fact a bunch of spheres that are "linked" to each other, so in the end one still calculates distances between spheres).

@Datseris Spherocylinders are a bit more complicated than that. What you described is the model used in the bacteria growth example in Agents.jl docs.

Anyway, I totally agree that a simpler example fits better with the docs/tutorials. Just let me know how I can help.

PS. In a month or two, we are submitting a paper using the more advanced version of the code that I shared here, with chemotaxis and other cool biophysical features. Afterwards, I will make it open source and maybe it can be added to the docs, as an example of real research project done by Agents.jl and CellListMap.jl.

Jul 27 '22 16:07 ehsanirani

Oh wow, looking at the example here https://codeberg.org/ehsan/test-Agents-CellListMap/src/branch/main/src/bacteria.jl it seems SUPER straightforward to couple the two packages. As expected of Julia of course.

I have started working with Julia since two months ago, and that is one of the reasons that I fell in love with it. Another reason is solving the two language problem. As a computational physicist, I really appreciate and admire it. I am already converting codes for 5 different projects to use Agents.jl, Molly.jl and CellListMap.jl.

Jul 27 '22 16:07 ehsanirani

What is the size (number of bacteria and obstacles) of the actual simulations? CellListMap is expected to provide reasonable speedups for >10k interactions, I guess. That is probably a relevant information to provide in the example.

I'm trying to simplify the example as much as possible (and learning how to use Agents.jl in the meantime). I will try to use that as an integration example on the CellListMap side as well.

Jul 27 '22 18:07 lmiq

What is the size (number of bacteria and obstacles) of the actual simulations?

There are 500-1000 bacteria and 2500-5000 non-overlapping obstacles (different box sizes: volume fraction between 0.2 to 0.7) in each simulation.

Jul 27 '22 19:07 ehsanirani

I am working in a minimalist version of the integration here:

https://github.com/lmiq/agents_with_celllistmap/

The simulations ARE working (the video is produced), but:

Seems that my agent_step! function is somewhat redundant to model_step!, as this last one is only computing the forces. I could just update the positions and velocities there.
Agents.jl has move_agents! and update_velocities! functions, which seem the reasonable choice to use here. However, from the docs I would imagine that this could integrate with DifferentialEquations for more sophisticated updating schemes. Don't they?
The fact that coordinates and velocities cannot be represented within an Agent as a StaticVector is somewhat odd, these are tuples but with all the arithmetic defined, so are more convenient than NTuples to represent coordinates.
To use CellListMap at some point one need the coordinates as an vector of vectors. This sort of duplicates the coordinate information, which is contained in the agent positions. The continous layout in memory of the coordinates if fundamental for performance. No big deal, as these are scalar copies. but I wander if the Agents could be structured as an structure of array layout, which would be more efficient (except for distributed simulations, I think).

As I mentioned, the code is not working yet, because I am still learning the Agents.jl interface. Probably I'm not the best person now to do this part of the integration, but it should be easy. If you want to comment there or even pull, that would be great.

Jul 27 '22 21:07 lmiq

The example above is now running correctly, and produces this video: https://github.com/lmiq/agents_with_celllistmap/blob/main/test.mp4

The best way to take advantage of Agents.jl is to be discussed.

Also it would be interesting to know how much CellListMap accelerates this simulation, relative to the "standard" way implemented in Agents.jl, as a function of the number of particles.

Jul 28 '22 01:07 lmiq

Seems that my agent_step! function is somewhat redundant to model_step!, as this last one is only computing the forces. I could just update the positions and velocities there.

You do not have to always use agent_step!. Use dummystep in place of agent_step! if you don't feel like you need to write rules that auto-loop over agents.

2. Agents.jl has move_agents! and update_velocities! functions, which seem the reasonable choice to use here. However, from the docs I would imagine that this could integrate with DifferentialEquations for more sophisticated updating schemes. Don't they?

No, not by default. They could though. In the docs of Agents.jl there is an integration example with DifferentialEquations.jl. However this version of update_velocities! I think will stay with the Euler scheme. It is an overkill to use DiffEq, and if a user wants to do it, they can integrate it formally anyways.

3. The fact that coordinates and velocities cannot be represented within an Agent as a StaticVector is somewhat odd, these are tuples but with all the arithmetic defined, so are more convenient than NTuples to represent coordinates.

Yes, this is a design decision I am not happy with and may likely change in the future to SVector. The problem is SVector itself is dubious as well. In my opinion, static vectors should be part of the standard library.

4. To use CellListMap at some point one need the coordinates as an vector of vectors. This sort of duplicates the coordinate information, which is contained in the agent positions. The continous layout in memory of the coordinates if fundamental for performance. No big deal, as these are scalar copies. but I wander if the Agents could be structured as an structure of array layout, which would be more efficient (except for distributed simulations, I think).

Unfortunately this duplication is necessary. It is by far the most intuitive and convenient way to represent agents as instances of structs with several fields, only one of which is a position. If each property of the agents was instead removed from the agents and into vectors, this would make for a less intuitive experience for the user, and much longer code. The bright side is that making this vector of positions is negligible cost both in memory and computationally when compared to computing anything using this vector.

Jul 28 '22 11:07 Datseris

@lmiq in your example version, why is the struct CellListMapData necessary? It isn't, right? One could make its fields directly be properties of the AgentBasedModel, saving the complexity of creating one more struct.

Jul 28 '22 11:07 Datseris

@lmiq when you are okay with the example, I'd suggest to open a PR here adding it, and there I can also push on the code and/or comment on the code!

Jul 28 '22 11:07 Datseris

Also it would be interesting to know how much CellListMap accelerates this simulation, relative to the "standard" way implemented in Agents.jl, as a function of the number of particles.

Yes, I think it is important of us to do this. Although it's 99.99% certain that CellListMap will accelerate performance, it is anyways good to know by how much.

What is the size (number of bacteria and obstacles) of the actual simulations? CellListMap is expected to provide reasonable speedups for >10k interactions, I guess. That is probably a relevant information to provide in the example.

That's something I don't immediately understand. The speedup here is versus what other version of the code? Just looping over all "agents"? I am supersprised if this is the case, as I would expect CellListMap to provide performance acceleration pretty much immediatelly even for a few agents. In Agents.jl we also have a structure that compartiments the continuous space and makes rectangular compartments for accelarating nearest neighbor searches. it is practically immediatelly worth it to use this accelerating structure according to our testing.

Jul 28 '22 11:07 Datseris

Yes, this is a design decision I am not happy with and may likely change in the future to SVector.

Couldn't the type of array be made generic?

@lmiq in your example version, why is the struct CellListMapData necessary? It isn't, right?

It isn't. But if one wants to use parallel in-place updates of the cell lists, some more fields are necessary and it may be clearer what if for what there. It is just a stylistic choice, yes.

That's something I don't immediately understand. The speedup here is versus what other version of the code? Just looping over all "agents"? I am supersprised if this is the case, as I would expect CellListMap to provide performance acceleration pretty much immediatelly even for a few agents.

There is a price that is paid on constructing the cell lists. If the number of particles is too small, it it not worth it (or building neighbor lists, etc). But I tested now and with 100 particles it is already faster than a naive loop, so your intuition was correct.

@lmiq when you are okay with the example, I'd suggest to open a PR here adding it, and there I can also push on the code and/or comment on the code!

The reason I'm not OK with the example is that I don't see what exactly I'm gaining in using Agents. I posted now an implementation of the same simulation without agents, here. The code is actually shorter, and the code is faster, as shown below.

The only thing I would get from using Agents, here, is the function to create the video, since the double loop on particles is implicit on CellListMap, and the integration scheme I had to write manually in both cases.

Thus, my feeling is that I don't know what I can really gain by using Agents and its integration with other tools (from Agents itself or from the ecosystem in general), and the example feels artificial.

julia> includet("./Particles.jl")

julia> using .Particles

julia> @time WithAgents.simulate();
run! progress: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:03
  6.561433 seconds (8.04 M allocations: 435.431 MiB, 2.06% gc time, 54.78% compilation time)

julia> @time WithAgents.simulate();
run! progress: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:03
  3.211807 seconds (64.54 k allocations: 4.913 MiB)

julia> @time NoAgents.simulate();
  2.377066 seconds (387.15 k allocations: 21.186 MiB, 12.53% compilation time)

julia> @time NoAgents.simulate();
  2.081114 seconds (2.12 k allocations: 622.031 KiB)

Jul 28 '22 15:07 lmiq

great, thanks!

Couldn't the type of array be made generic?

Hm, I am not sure what you mean by that. The position is a field of a struct, the agent struct. It needs to be concretely typed. We chose NTuple because it is simpler and more intuitive for a general user, but also because it is equivalent with SVector internally. (an SVector is nothing more than a wrapper of NTuple)

The reason I'm not OK with the example is that I don't see what exactly I'm gaining in using Agents

Well this of course depends on the application. However, as far as I can tell there are many reasons to be using Agents, even in your case where the simulation is simple enough that keeping track of only positions between particles is necessary. In a typical application the "particles" would have other properties as well, like size, charge, chemical whatever, name, or whatever else. Agents binds all of this into a dedicated struct that has all the properties corresponding to a particle, and then using the id::Int field allows finding this particle. Agents.jl also has several functions for finding things nearby a position. nearby_agents(pos, model, r) will iterate over agents given a position and radius. How would I code this from CellListMap.jl? Then, the plotting by itself is not just "a feature", but an absolutely massive piece of work that took many many months, and allows extreme convenience: simply providing the model stepping functions turns the "simple plotting" into a fully interactive GUI that allows you to step the model on demand as well as change model parameters live during model evolution. Another thing is data collection: it is trivial to collect data by using Agents.jl to later analyzing them with DataFrames.jl, see the Schelling example for how easy this becomes. For example, if I wanted to collect the chemical potential property of all agents, average it over agents, but only include agents that have mass > 0.5, it would still take me only a single line of code to do so. In fact, I'd say just the data collection by itself would be a reason to use Agents.jl. There are many more things you can get from Agents.jl that are well beyond computing distance-based interactions between particles. The API listing is too large to start listing things here one by one: https://juliadynamics.github.io/Agents.jl/stable/api/ . So, I think the better way to think about it is not "what am I gaining by using Agents as a user of CellListMap". The better way is "what am I gaining as a user of Agents by using CellListMap". The latter makes more sense, because Agents.jl is a generic simulation framework, and thus it often doesn't provide specialized computation acceleration tools out of the box.

Therefore, the best person to do this PR into Agents.jl that adds an integration example is a user that has already benefitted by combining both software, i.e., @ehsanirani .

and the example feels artificial

Here I have to disagree. The point of the example is to show how one can integrate the two software, not how to make novel science nor how to use the most of either. Just how to use both at the same time. Users of Agents.jl will then know how to utilize this to speed up calculations in continuous space scenarios. Users of CellListMap.jl will see that there are frameworks that allow doing really complicated simulations seamlessly, especially when it comes to the data collection part. Also providing incredible plotting infrastructure out of the box.

the code is faster, as shown below

That's true, but that's also expected, as the agents in Agents.jl are stored in a dictionary that is slower to iterate over. This cost comes from the convenience of being able to identify agents by their ID, which remains unique and unchangeable during the simulation. However, I am surprised that the code is twice as fast. Later I will take a look at the code in more detail because this difference is surprising and much larger than what it should be from the change of vector to dictionary.

Jul 28 '22 15:07 Datseris

With two very simple changes I immediatelly noticed, the difference in performance becomes much smaller:

With agents
  2.228872 seconds (3.91 k allocations: 833.219 KiB)
No agents
  1.751680 seconds (2.14 k allocations: 625.312 KiB)

code at: https://github.com/Datseris/agents_with_celllistmap

Notice that the run! function performs both the model evolution, as well as data collection. But the bulk of the performance difference came from using ntuple instead of Tuple.

Jul 28 '22 16:07 Datseris

Notice also that besides the fact that in one case we iterate over a dictionary while in the other over a vector, one more reason the performance of using both frameworks will never reach the perforamnce of using only CellListMap.jl is due to the duplicacy of information of the agent positions, as well as updating them twice in the integration case.

Jul 28 '22 16:07 Datseris

Separating the model creation from the model stepping gives:

With agents, only stepping
  2.260443 seconds
No agents, only stepping
  1.725879 seconds

which highlights that (1) the difference in memory usage is only during model creation and (2) all performance difference is still in the stepping, not initialization part.

I still need to make a version that uses only Agents.jl and interacting_pairs.

Jul 28 '22 16:07 Datseris

Hm, I am not sure what you mean by that. The position is a field of a struct, the agent struct.

The agent type could be parametrized for the type of coordinates (i. e. struct Agent{V} pos::V end).

Concerning all the integration importance and the features of Agents: To be clear, I do not doubt that a lot can be gained by using the Agents infrastructure. The thing is that, not being myself (currently) a user of Agents, I simply do not know if what I've written as an Integration example uses the Agents infrastructure enough so that all that is really possible.

I still need to make a version that uses only Agents.jl and interacting_pairs.

Yes, that is the interesting one (I would expect more than 2x :-), particularly for large systems, where the computation of pairwise interactions dominate - note that using a vector instead a dictionary allows for simd, memory alignment and locality, etc. There is a lot to be gained there).

Jul 28 '22 16:07 lmiq

@lmiq could you help me in the following, just to be sure I don't mess up:

The cutoff parameter that is calculated at the start is so that we only calculate forces between agents that are within distance cutoff of each other, right? Because later there is also

    d = sqrt(d2)
    if d ≤ (ri + rj)

and only then forces are calculated. Could you please clarify why there are two "cutoffs" and what does each one succeed in doing? EDIT: And also please tell me what is d2. EDIT2: It's the squared distance between particles which I assume is automatically computed in CellListMap.jl but in the Agents-only iteration it isn't. Do I still have to compute it, or is the first cutoff enough?

Jul 28 '22 17:07 Datseris

I had a quick glance at this thread, and it looks really cool! A couple things I'd like to add:

@Datseris I'll consider whether it is useful to create some extra fields in the ContinuousSpace struct in Agents.jl, so that it out-of-the box has the things expected by CellListMap already set up (there will be a boolean flag whether to enable this so the user can choose).

While I'm sure you've already thought of this, I think we should only modify the struct if it's absolutely necessary for a clean API and is general enough to be applicable to a large variety of models, for the same reason that we keep pathfinding separate. If CellListMap has a breaking change, to support it we might need a breaking change as well.

@lmiq note that using a vector instead a dictionary allows for simd, memory alignment and locality, etc. There is a lot to be gained there).

The downside here is that deleting and adding agents during a simulation becomes much more complicated. Either we do the O(n) operation of deleting an element from an array, or leave a hole there and add a check whenever that data is accessed. With the second approach, we also need to start tracking how many holes there are, periodically clean it up and/or add new agents to those holes. The exact semantics of doing so in a bug free manner are non-trivial at best. It would also require measuring at what point the performance difference is noticeable, and by how much.

That said, it would be cool if we could have our cake and eat it too, as the saying goes :)

Jul 28 '22 18:07 AayushSabharwal

Yeah, I could have simplfied that. The idea there is that each particle has a radius, and the interaction potential is zero if the distance between the particles is greater than the sum of the radii of the two particles involved. The cutoff that is used to build the cell lists must be the sum of the two greater radii of the system, so no interaction is lost.

Yet, in the example I am starting all particles with the same radius, because of this default:

Particle(id, pos, vel) = Particle(id, Tuple(pos), Tuple(vel), 10.0, 1.0, 1.0)

(all radii are 10.0), thus the cutoff can be just 20.

Clarifying a little bit further:

The cell list computation will run over all pairs of particles closer than a cutoff. This is the cutoff of the code. That is independent on how the forces are calculated for one pair of particles, except that the forces won't be calculated if the distance between the particles is greater than that cutoff. In this example, the particles have radii, and the interaction is only different from zero if the distance between the particles is smaller than the sum of their radii. That can be simply the same cutoff if all particles have the same radii and cutoff = 2 * radius, but it can be more general than that. We can simplify the model in this sense, but in this case it would not make sense to have a radius field for the agents, since all agents would be identical.

Yes, d2 is the squared distance, that is computed by CellListMap and passed to the inner function that is mapped, generally it is used, or it would have to be recalculated.

Jul 28 '22 18:07 lmiq

The downside here is that deleting and adding agents during a simulation becomes much more complicated.

Yes, certaintly there is some gained flexibility with the current Agents approach, although for the applications I'm used to the downsides I think are greater than the gains. Anyway, in the lines of your comment, I think having a clear path on to how use CellListMap in the context of Agents is good enough, CellListMap is a very specialized package for computing pairwise interactions within a cutoff (I think it may be much faster than the interaction_pairs function, actually, and it supports general periodic boundary conditions, is parallel, etc). How important would be to have Agents much faster in the cases where CellListMap applies depends on how frequent are these cases and the sizes of the systems people use to simulate.

Jul 28 '22 18:07 lmiq

Agents.jl Agents.jl copied to clipboard

Integration with CellListMap.jl

Agents.jl
Agents.jl copied to clipboard