replace weakref dict with normal dict
Just noticed that with the cell space testing agentrefs became the main bottleneck. This PR is just to run the benchmark code and see how much difference it makes. Maybe we can talk about the advantages and disadvantages of weakrefs some more. (That is to say I want to start a discussion, not necessarily get this merged).
Performance benchmarks:
| Model | Size | Init time [95% CI] | Run time [95% CI] |
|---|---|---|---|
| Schelling | small | 🟢 -22.3% [-22.6%, -22.0%] | 🟢 -25.9% [-26.1%, -25.8%] |
| Schelling | large | 🟢 -23.7% [-24.3%, -23.1%] | 🟢 -26.4% [-27.2%, -25.6%] |
| WolfSheep | small | 🟢 -25.9% [-26.4%, -25.5%] | 🟢 -22.3% [-22.4%, -22.2%] |
| WolfSheep | large | 🟢 -48.0% [-57.1%, -37.6%] | 🟢 -24.7% [-25.5%, -24.0%] |
| BoidFlockers | small | 🟢 -6.7% [-7.2%, -6.1%] | 🔵 -1.9% [-2.5%, -1.3%] |
| BoidFlockers | large | 🟢 -7.0% [-7.6%, -6.3%] | 🔵 +0.1% [-0.5%, +0.7%] |
That are some serious numbers. Quite consistent reduction in runtime of ~25%.
Wonder how we can keep the functionality of weakness without the (majority of) the performance impact.
At least good to know that this is the effect.
Cython has __weakref__ functionality that could be tested for performance. I will allocate a time this week to migrate the build system from Hatch to Meson, so that experimenting with Cython code is much easier.
Didn't we just migrate to hatch? Anyway, that would be great!
Thanks for checking this. It was also on my list of things to check. I am surprised by the overhead of weakrefs. I knew it is done in pure python and that there thus would be overhead. I never imagined that it was this big.
The reason for using weakrefs in AgentSet was to avoid memory leaks. It is also convenient for the user that you don't have to remove agents from the space, the scheduler, and whatever else there might be. However, if the performance overhead is so large, I am wondering whether the cure is not worse than the problem.
So, I ran into the slowness again. While playing with #1994, I moved Cell.agents to Cell._agents and turned Cell.agents into a property that returned an AgentSet. On WolfSheep, this doubles the runtime, which is just plain silly.
So the problem seems to be WeakkeyDictionary which is a pure python implementation. It might be possible to substantially improve our speed without losing the use of weakrefs.
A quick update: I looked into the current implementation of WeakkeyDict. It is basically a clever wrapper around a normal dict with weakkey.ref() keys. Some minor speedups may be possible because the code uses some slightly dated constructions and has protection for multithreading (which we don't need). I am going to test this over the weekend. Since it is normal Python code, I also see no reason why Cython could not be used. Again, I'll try to test over the weekend.
Another experiment. I changed the implementation of Agentset.shuffle in 2 ways: I shuffle the weakrefs themselves, and it defaults to inplace updating (quick dirty check, can be changed also within the schedulers. The first column below is the current code on main. I'll do a comparison against @Corvince version with a dict later this morning.
| Model | Size | Init time [95% CI] | Run time [95% CI] |
|---|---|---|---|
| Schelling | small | 🔵 -0.1% [-0.5%, +0.3%] | 🟢 -8.0% [-8.1%, -7.9%] |
| Schelling | large | 🔵 -1.2% [-2.6%, +0.3%] | 🟢 -8.2% [-9.2%, -7.2%] |
| WolfSheep | small | 🔵 -3.6% [-4.4%, -2.8%] | 🟢 -19.9% [-20.2%, -19.5%] |
| WolfSheep | large | 🟢 -56.3% [-63.0%, -43.8%] | 🟢 -19.0% [-21.8%, -15.5%] |
| BoidFlockers | small | 🔵 +0.9% [-0.6%, +2.6%] | 🔵 -3.4% [-4.4%, -2.3%] |
| BoidFlockers | large | 🔵 +0.1% [-1.3%, +1.5%] | 🔵 -1.9% [-3.1%, -1.0%] |
The new code is below. The key slowdown compared to a normal dict is that when iterating over a WeakkeyDict, you are turning all weakrefs into hard refs, and (in agentset) back again into weakrefs. This is rather inefficient as evidenced above.
def shuffle(self, inplace: bool = True) -> AgentSet:
"""
Randomly shuffle the order of agents in the AgentSet.
Args:
inplace (bool, optional): If True, shuffles the agents in the current AgentSet; otherwise, returns a new shuffled AgentSet. Defaults to False.
Returns:
AgentSet: A shuffled AgentSet. Returns the current AgentSet if inplace is True.
"""
weakrefs = list(self._agents.keyrefs())
self.random.shuffle(weakrefs)
if inplace:
self._agents.data = {entry:None for entry in weakrefs}
return self
else:
return AgentSet((agent for ref in weakrefs if (agent:=ref()) is not None), self.model)
Great work! Curious if we can get towards that 25%, or even more with Cython.
For completeness: dict (@Corvince version) against updated shuffle. So, there is still a performance loss for using WeakkeyDict, but updating the shuffle can help a lot. I am not sure why large Schelling is not performant. I am also curious why the init times are so much better for a dict.
| Model | Size | Init time [95% CI] | Run time [95% CI] |
|---|---|---|---|
| Schelling | small | 🟢 -17.3% [-17.7%, -16.7%] | 🟢 -16.7% [-17.0%, -16.4%] |
| Schelling | large | 🟢 -19.8% [-20.4%, -19.1%] | 🟢 -24.1% [-24.9%, -23.3%] |
| WolfSheep | small | 🟢 -18.5% [-19.3%, -17.8%] | 🟢 -5.2% [-5.7%, -4.7%] |
| WolfSheep | large | 🟢 -17.6% [-18.8%, -16.1%] | 🟢 -13.2% [-15.4%, -11.3%] |
| BoidFlockers | small | 🟢 -7.9% [-9.2%, -6.8%] | 🔵 -0.9% [-1.5%, -0.4%] |
| BoidFlockers | large | 🟢 -7.1% [-8.5%, -5.6%] | 🔵 -0.6% [-1.3%, +0.2%] |
With #2007 merged and my suggested change to shuffle, we are getting close. Here, I compare the dict version to the current main with the updated shuffle implementation. As can be seen, we now lose about 15% with weakrefs. We effectively have halved the performance difference on the large models.
| Model | Size | Init time [95% CI] | Run time [95% CI] |
|---|---|---|---|
| Schelling | small | 🟢 -19.5% [-20.2%, -18.7%] | 🔵 +1.1% [+0.5%, +1.8%] |
| Schelling | large | 🟢 -21.5% [-22.6%, -20.6%] | 🟢 -12.0% [-16.3%, -8.0%] |
| WolfSheep | small | 🟢 -18.4% [-19.0%, -17.7%] | 🟢 -5.4% [-5.9%, -4.8%] |
| WolfSheep | large | 🟢 -17.0% [-18.5%, -15.3%] | 🟢 -12.5% [-14.2%, -10.9%] |
| BoidFlockers | small | 🟢 -8.6% [-10.2%, -7.0%] | 🔵 -1.0% [-1.8%, -0.3%] |
| BoidFlockers | large | 🟢 -7.0% [-8.5%, -5.5%] | 🔵 -1.8% [-3.0%, -0.4%] |
Performance benchmarks:
| Model | Size | Init time [95% CI] | Run time [95% CI] |
|---|---|---|---|
| Schelling | small | 🟢 -22.7% [-23.0%, -22.2%] | 🔵 -2.6% [-2.9%, -2.4%] |
| Schelling | large | 🟢 -29.8% [-39.6%, -24.6%] | 🔵 -1.6% [-2.7%, -0.4%] |
| WolfSheep | small | 🟢 -26.4% [-26.7%, -26.1%] | 🟢 -5.6% [-5.8%, -5.5%] |
| WolfSheep | large | 🟢 -28.8% [-35.9%, -24.9%] | 🟢 -5.8% [-7.9%, -3.5%] |
| BoidFlockers | small | 🟢 -6.6% [-7.1%, -6.1%] | 🔵 -0.8% [-1.6%, +0.0%] |
| BoidFlockers | large | 🟢 -8.4% [-9.0%, -7.8%] | 🔵 -1.1% [-1.7%, -0.4%] |
After me being stupid for a a bit, I got the benchmarks to run again. So the performance difference of dict compared to weakkeydict is still there but it is a lot smaller than it was before.
Yes I think the difference is negligible at this point. I think we can close this PR then, do you agree?