mesa
mesa copied to clipboard
add `group_by` method to `AgentSet` for attribute-based grouping
Discussion(https://github.com/projectmesa/mesa/discussions/2074)
In this pr I have added a group_by method in the AgentSet class, allowing agents to grouped based on some specific attributes.
Performance benchmarks:
Model | Size | Init time [95% CI] | Run time [95% CI] |
---|---|---|---|
Schelling | small | 🔵 -0.0% [-0.3%, +0.3%] | 🔵 -1.0% [-1.3%, -0.8%] |
Schelling | large | 🔵 +0.2% [-0.7%, +1.1%] | 🔵 -3.9% [-6.6%, -1.5%] |
WolfSheep | small | 🔵 -0.4% [-1.3%, +0.5%] | 🔵 -0.5% [-0.7%, -0.3%] |
WolfSheep | large | 🔵 +0.6% [+0.2%, +0.9%] | 🔵 +0.9% [-2.0%, +3.6%] |
BoidFlockers | small | 🔵 +1.8% [+1.1%, +2.5%] | 🔵 +1.0% [+0.3%, +1.6%] |
BoidFlockers | large | 🔵 -3.2% [-6.6%, -0.0%] | 🔵 -3.8% [-5.8%, -1.6%] |
Thanks for this PR! It looks like it could fill a nice use case, especially for attributes with discrete values.
On thing I'm thinking about is if we can do something smart to select ranges or thresholds for attributes with continuous values. For example divide agents in to bins based on some attribute.
Curious what the performance of this is compared to (multiple) select()
cases.
I also had a quick look. I like the direction. Some quick points:
- how to handle non-discrete attributes? The current code is not explicit on this. The obvious direction is to have some kind of binning. Alternatively, there could be some check of the type of the value of the attribute. If this is, for example, a float, an exception or warning might be raised. Either way, this issue need to be handled explicitly.
- I would like to see unit tests for this code to ensure we maintain proper coverage.
Thanks for this PR! It looks like it could fill a nice use case, especially for attributes with discrete values.
On thing I'm thinking about is if we can do something smart to select ranges or thresholds for attributes with continuous values. For example divide agents in to bins based on some attribute.
Curious what the performance of this is compared to (multiple)
select()
cases.
Hi, Thank you for the feedback!
Integrating range or threshold-based grouping for continuous attributes is a great idea. We could potentially extend group_by
to accept a binning function or range specifications to categorize these attributes. Regarding performance, initial tests suggest group_by
is more efficient for discrete attributes compared to multiple select()
calls, especially as the number of attributes increases. I'll explore the binning functionality and conduct further performance comparisons.
I also had a quick look. I like the direction. Some quick points:
- how to handle non-discrete attributes? The current code is not explicit on this. The obvious direction is to have some kind of binning. Alternatively, there could be some check of the type of the value of the attribute. If this is, for example, a float, an exception or warning might be raised. Either way, this issue need to be handled explicitly.
- I would like to see unit tests for this code to ensure we maintain proper coverage.
Addressing non-discrete attributes through binning or type checks is a valid concern. I propose adding an optional binning_function
parameter to the method to handle continuous attributes. This function can categorize attributes into discrete bins. If not provided for continuous attributes, we'll issue a warning. I'll also add the necessary unit tests.
I tried pandas' groupby
on float values:
In [4]: data = {"a": np.random.random(4), "b": np.random.random(4)}
In [5]: df = pd.DataFrame(data=data)
In [6]: df
Out[6]:
a b
0 0.437550 0.389805
1 0.180405 0.032264
2 0.509254 0.147376
3 0.181636 0.102611
In [9]: for k, v in df.groupby("b"):
...: print(df.groupby("b").get_group(k))
...:
a b
1 0.180405 0.032264
a b
3 0.181636 0.102611
a b
2 0.509254 0.147376
a b
0 0.43755 0.389805
i.e. they seem to just let you do it even if the group has 1 element each.
I tried pandas' groupby on float values:
I like to get other people's perspective on this as well. Personally, I am in favor of mimicking pandas as much as possible, so just group on the float as normal. This keeps the code nice and simple. If you want to do binning, you can already easily code this up through AgentSet.do
and a callable. So
my_agents.do(some_binning_function).group_by("my_newly assigned attribute")
Thanks for this! Just one thought on the binning conversation and a question regarding the appropriate return type. The code itself, and the tests look fine.
Thanks for this! Just one thought on the binning conversation and a question regarding the appropriate return type. The code itself, and the tests look fine.
Hi sorry for the late reply.
I think we don't need to change the It group_by
method to handle the binning specifically, it will handle continuous attributes directly without enforcing binning.
Here is example usage for binning and grouping :
def binning_function(agent):
agent.binned_attribute = agent.continuous_attribute // 10 * 10
my_agents.do(binning_function)
grouped_agents = my_agents.group_by("binned_attribute")
@ai-naymul is this PR ready for (final) review? If so, can you mark it as such (by pressing the "Ready to review") button?
Please also update the PR message with a bit of context (including which problem this solves and how it's implemented on a high level).
@ai-naymul is this PR ready for (final) review? If so, can you mark it as such (by pressing the "Ready to review") button?
Please also update the PR message with a bit of context (including which problem this solves and how it's implemented on a high level).
Hi sorry for the inconvenience, I was having some health issues and wasn't able to work on this properly as well.
I think need some improvement like need the feedback regarding the return type of the group_by method from @quaquel .
I think it's final for review although. I will add some description about this pr as well.
@ai-naymul are you still interested on working on this? If so I would like to have an explanation of the feature and an example in the PR message. A bit like: #1890 or #1916 (but it can be shorter since it's a smaller feature).
This has been superseded by #2220. @ai-naymul thanks for the hard work on this. I started from your code in #2220.