mesa icon indicating copy to clipboard operation
mesa copied to clipboard

add `group_by` method to `AgentSet` for attribute-based grouping

Open ai-naymul opened this issue 10 months ago • 11 comments

Discussion(https://github.com/projectmesa/mesa/discussions/2074)

In this pr I have added a group_by method in the AgentSet class, allowing agents to grouped based on some specific attributes.

ai-naymul avatar Mar 28 '24 09:03 ai-naymul

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
Schelling small 🔵 -0.0% [-0.3%, +0.3%] 🔵 -1.0% [-1.3%, -0.8%]
Schelling large 🔵 +0.2% [-0.7%, +1.1%] 🔵 -3.9% [-6.6%, -1.5%]
WolfSheep small 🔵 -0.4% [-1.3%, +0.5%] 🔵 -0.5% [-0.7%, -0.3%]
WolfSheep large 🔵 +0.6% [+0.2%, +0.9%] 🔵 +0.9% [-2.0%, +3.6%]
BoidFlockers small 🔵 +1.8% [+1.1%, +2.5%] 🔵 +1.0% [+0.3%, +1.6%]
BoidFlockers large 🔵 -3.2% [-6.6%, -0.0%] 🔵 -3.8% [-5.8%, -1.6%]

github-actions[bot] avatar Mar 28 '24 09:03 github-actions[bot]

Thanks for this PR! It looks like it could fill a nice use case, especially for attributes with discrete values.

On thing I'm thinking about is if we can do something smart to select ranges or thresholds for attributes with continuous values. For example divide agents in to bins based on some attribute.

Curious what the performance of this is compared to (multiple) select() cases.

EwoutH avatar Mar 28 '24 10:03 EwoutH

I also had a quick look. I like the direction. Some quick points:

  1. how to handle non-discrete attributes? The current code is not explicit on this. The obvious direction is to have some kind of binning. Alternatively, there could be some check of the type of the value of the attribute. If this is, for example, a float, an exception or warning might be raised. Either way, this issue need to be handled explicitly.
  2. I would like to see unit tests for this code to ensure we maintain proper coverage.

quaquel avatar Mar 28 '24 10:03 quaquel

Thanks for this PR! It looks like it could fill a nice use case, especially for attributes with discrete values.

On thing I'm thinking about is if we can do something smart to select ranges or thresholds for attributes with continuous values. For example divide agents in to bins based on some attribute.

Curious what the performance of this is compared to (multiple) select() cases.

Hi, Thank you for the feedback!

Integrating range or threshold-based grouping for continuous attributes is a great idea. We could potentially extend group_by to accept a binning function or range specifications to categorize these attributes. Regarding performance, initial tests suggest group_by is more efficient for discrete attributes compared to multiple select() calls, especially as the number of attributes increases. I'll explore the binning functionality and conduct further performance comparisons.

ai-naymul avatar Mar 28 '24 11:03 ai-naymul

I also had a quick look. I like the direction. Some quick points:

  1. how to handle non-discrete attributes? The current code is not explicit on this. The obvious direction is to have some kind of binning. Alternatively, there could be some check of the type of the value of the attribute. If this is, for example, a float, an exception or warning might be raised. Either way, this issue need to be handled explicitly.
  2. I would like to see unit tests for this code to ensure we maintain proper coverage.

Addressing non-discrete attributes through binning or type checks is a valid concern. I propose adding an optional binning_function parameter to the method to handle continuous attributes. This function can categorize attributes into discrete bins. If not provided for continuous attributes, we'll issue a warning. I'll also add the necessary unit tests.

ai-naymul avatar Mar 28 '24 12:03 ai-naymul

I tried pandas' groupby on float values:

In [4]: data = {"a": np.random.random(4), "b": np.random.random(4)}

In [5]: df = pd.DataFrame(data=data)

In [6]: df
Out[6]:
          a         b
0  0.437550  0.389805
1  0.180405  0.032264
2  0.509254  0.147376
3  0.181636  0.102611
In [9]: for k, v in df.groupby("b"):
   ...:     print(df.groupby("b").get_group(k))
   ...:
          a         b
1  0.180405  0.032264
          a         b
3  0.181636  0.102611
          a         b
2  0.509254  0.147376
         a         b
0  0.43755  0.389805

i.e. they seem to just let you do it even if the group has 1 element each.

rht avatar Mar 29 '24 01:03 rht

I tried pandas' groupby on float values:

I like to get other people's perspective on this as well. Personally, I am in favor of mimicking pandas as much as possible, so just group on the float as normal. This keeps the code nice and simple. If you want to do binning, you can already easily code this up through AgentSet.do and a callable. So

my_agents.do(some_binning_function).group_by("my_newly assigned attribute")

quaquel avatar Apr 07 '24 08:04 quaquel

Thanks for this! Just one thought on the binning conversation and a question regarding the appropriate return type. The code itself, and the tests look fine.

quaquel avatar Apr 07 '24 08:04 quaquel

Thanks for this! Just one thought on the binning conversation and a question regarding the appropriate return type. The code itself, and the tests look fine.

Hi sorry for the late reply.

I think we don't need to change the It group_by method to handle the binning specifically, it will handle continuous attributes directly without enforcing binning.

Here is example usage for binning and grouping :

def binning_function(agent): agent.binned_attribute = agent.continuous_attribute // 10 * 10

my_agents.do(binning_function)

grouped_agents = my_agents.group_by("binned_attribute")

ai-naymul avatar Apr 09 '24 06:04 ai-naymul

@ai-naymul is this PR ready for (final) review? If so, can you mark it as such (by pressing the "Ready to review") button?

Please also update the PR message with a bit of context (including which problem this solves and how it's implemented on a high level).

EwoutH avatar Apr 17 '24 12:04 EwoutH

@ai-naymul is this PR ready for (final) review? If so, can you mark it as such (by pressing the "Ready to review") button?

Please also update the PR message with a bit of context (including which problem this solves and how it's implemented on a high level).

Hi sorry for the inconvenience, I was having some health issues and wasn't able to work on this properly as well.

I think need some improvement like need the feedback regarding the return type of the group_by method from @quaquel .

I think it's final for review although. I will add some description about this pr as well.

ai-naymul avatar Apr 19 '24 06:04 ai-naymul

@ai-naymul are you still interested on working on this? If so I would like to have an explanation of the feature and an example in the PR message. A bit like: #1890 or #1916 (but it can be shorter since it's a smaller feature).

EwoutH avatar Jul 03 '24 11:07 EwoutH

This has been superseded by #2220. @ai-naymul thanks for the hard work on this. I started from your code in #2220.

quaquel avatar Aug 22 '24 09:08 quaquel