AgentSet: Allow selecting a fraction of agents in the AgentSet
This PR adds a p parameter to the select method in the AgentSet class, allowing users to specify a fraction of agents to be selected from the set.
Motive
The existing select method only allowed selection based on a fixed number (n) of agents or a filter function. The addition of the p parameter enhances flexibility by enabling selection based on a percentage of the total agents, addressing scenarios where relative selection is more appropriate than absolute selection.
Implementation
The select method was updated to include an optional p parameter (defaulting to 1.0). If p is specified and less than 1.0, the method calculates the corresponding number of agents (n) to select as a fraction of the total. The code was modified to ensure compatibility with the existing functionality, including adjustments to the conditions that determine when the method should return early.
Usage Examples
# Select 20% of agents from the AgentSet
selected_agents = agents.select(p=0.2)
Together with #2254, you can now set a value for a fraction of your AgentSet:
# Select 40% of the agents from the AgentSet
model.agents.select(p=0.4).set('has_license', True)
This feature is particularly beneficial in models where the agent set size may vary, and proportional selection is required.
Additional Notes
Basically this is a shortcut for:
n_agents = len(some_agentset)
some_agentset.select(n=n_agents)
So it breaks chaining if you have to do this. Directly being able to select a fraction allows you to continue the chain.
Performance benchmarks:
what is the motivation for adding this to the agentset?
That seems useful, thanks!
The only worry I have is how this behaves if a user specifies both n and p. That probably should raise an error?
Or maybe there is a good name that could incorporate both p and n? So if it is between 0 and 1 use a fraction and if it is a whole number above 1 use that number?
what is the motivation for adding this to the agentset?
Sorry, was still working on other features (and my actual model), wrote it up.
That seems useful, thanks!
The only worry I have is how this behaves if a user specifies both n and p. That probably should raise an error?
Yeah I was thinking about that. Maybe just don't do that (and we mention it in the docstring)?
If you just want to select a fraction of n, you can do n=round(n*p), so having both doesn't make sense.
Or maybe there is a good name that could incorporate both p and n? So if it is between 0 and 1 use a fraction and if it is a whole number above 1 use that number?
Very interesting idea, but maybe in this case explicit is better than implicit. Except if you can come up with a killer name.
I like the clarity of p. So my suggestion would be to raise a value error if both n and p are passed
see the few minor comments and once unit tests are added, this is good to go.
Okay, I:
- Changed
ptofraction - Used the ValueError
- Updated the other docstring, including notes
- Added tests
- Updated the examples
However, I noticed that there's an important difference between n and fraction. n is always fixed, it's just an upper limiter. fraction does matter when you apply it, before or after the rest of the selection.
Currently fraction is interpreted as a fraction of the input AgentSet. When writing the usage examples that felt really counter intuitive. It would be more logical if you could apply it afterwards, such that a fraction of the selected AgentSet is returned.
Why? Because if you take these two use cases:
- Select the agents with "wealth" less than 5 but at most 20% of total agents
- Select the agents with "wealth" less than 5, and then 20% of those agents
The latter is used way more than the former. And it will be way more logical if you select by type.
So I would suggest applying fraction afterwards, on the selected AgentSet after all other operations are done. Then you could still do both:
# Select the agents with "wealth" less than 5, and at most 20% of total agents
agents.select(fraction=0.2).select(lambda agent: agent.wealth < 5)
# Select the agents with "wealth" less than 5, and then 20% of those agents
agents.select(lambda agent: agent.wealth < 5, fraction=0.2)
# or, equivalently:
agents.select(lambda agent: agent.wealth < 5).select(fraction=0.2)
But now the one that's more used and more intuitive will go well by default.
Totally other options could be:
- Don't allow
fractionand/ornwith other functions, but enforce chaining - Introduce a new method, like
sample, that give a sample ofnor a sample offraction.
what is the motivation for adding this to the agentset?
@EwoutH I'm also wondering about this. Not saying that this shouldn't be in the library, but a concrete example could give some illustration. Is this used in your project?
This was the thing I wanted to do:
# Randomly select 40% of the agents from the AgentSet and give them a license
model.agents.shuffle().select(fraction=0.4).set('has_license', True)
I needed to do this:
n_license = round(model.agents * license_chance)
model.agents.shuffle().select(n=n_license).do(lambda agent: setattr(agent, 'has_license', True))
With #2254 it got simplified to:
n_license = round(model.agents * license_chance)
model.agents.shuffle().select(n=n_license).set('has_license', True)
It's not a huge use case, but it's nice. Especially that you don't need to break the chain.
Combine it with a function and it get's really powerful though. Assume I want to distribute some cars around (I know a certain percentage of all people has a car), but only to agents with licenses.
agents.select(lambda a: a.has_license, fraction=car_chance).set('has_car', True)
Without the fraction, this would have been:
n_car = round(model.agents * car_chance)
model.agents.shuffle().select(n=n_car ).set('has_license', True)
So yeah, it's not a huge use case. Maybe it adds some complexity.
There's an unique application for fraction as upper limit (cap), as currently implemented, and a unique application for doing it afterwards. I need to think about this a bit longer.
Right, n=0 has a special status. With a small fraction or small agentset, n can become 0, returning all agents.
Right,
n=0has a special status. With a smallfractionor small agentset,ncan become0, returning all agents.
Good catch!
I see two possibilities now. Either just change the special meaning from 0 to -1. I don't know if there was a good use case for 0, but it's rather strange for 0 to indicate all agents.
The more holistic approach would be to split select into a filter function and a sample function. This would also simplify the logic and solve the "before or after" question (which was present but unconsidered before fraction was introduced)
The brain is so interesting that after a nights sleep you look at it again and you think oh, and it all clicks together.
Now I just have to write it up, rewrite the codes, tests and examples.
Can’t wait for 2026/2027 where with a voice message a bit does that automatically.
Long story short: There’s a special use case for when filtering, you want a certain number or fraction at most. Especially the fraction should happen right there in the function, because after the function is done, you don’t know how large the
For all other cases (before, after) a sample method would be perfect (and can be implemented pretty fast I think). sample could also draw a random sample, where select selects the first n/fraction.
Or maybe there is a good name that could incorporate both p and n? So if it is between 0 and 1 use a fraction and if it is a whole number above 1 use that number?
Obviously the way to go. I was thinking max, limit, ceiling or at_most.
Looking forward at what @EwoutH comes up with. But I like @Corvince suggestion of having filter and sample. From a pure performance standpoint, minimizing the looping required for the use cases described here would be really beneficial. Chaining multiple select or shuffle and select, for something like "give me a random sample of 40% of the agents that have a particular attribute" is not ideal. It requires multiple loops where 1 should be sufficient. In particular, in the case of large numbers of agents, this becomes very inefficient.
Of course, a user could do most of this stuff with a well designed custom function passed to e.g. select so performance is not the only design concern and their are already clean ways of handling those.
Agreed on the performance aspect. One way to solve this but keep the chainable approach would be to use generator functions to return iterators instead of the complete AgentSet. But maybe as you said this is all mainly catered towards nice semantics and there are other ways already available for performance critical operations.
Agreed on the performance aspect. One way to solve this but keep the chainable approach would be to use generator functions to return iterators instead of the complete AgentSet. But maybe as you said this is all mainly catered towards nice semantics and there are other ways already available for performance critical operations.
That's an interesting idea worth exploring at some point (but not this PR). Basically, what if we have a generator interface to an AgenSet? And can we make a chainable API work with generators?
It seems we keep coming back to this (https://github.com/projectmesa/mesa/pull/2220#issuecomment-2297117745), so it’s certainly worth exploring at some point.
Agreed on the performance aspect. One way to solve this but keep the chainable approach would be to use generator functions to return iterators instead of the complete AgentSet. But maybe as you said this is all mainly catered towards nice semantics and there are other ways already available for performance critical operations.
That's an interesting idea worth exploring at some point (but not this PR). Basically, what if we have a generator interface to an AgenSet? And can we make a chainable API work with generators?
I think having an __iter__ method is kind of enough, so
(agent for agent in agentset)
should already give you an iterator over the agentset. Definitely worth exploring that more, but certainly way out of scope for this PR
//Edit Ah, sorry, didn't think this through. Definitely needs more thought on the possibility to make this chainable. This if course only iterates over the agents themselves
I updated this PR to replace n with max.
max (int | float, optional): The maximum amount of agents to select. Defaults to infinity.
- If an integer of 1 or larger, the first n matching agents are selected.
- If a float between 0 and 1, at most that fraction of original the agents are selected.
Some details:
-
max=1will give one agent,max=1.0gives all agents. - A fallback for
nwas added, which doesmax = nand throws a warning.
Tests are updated. Please double check the internal agent_generator function.
If we decide this is the way to go, I will update the PR description.
I plan on adding a separate sample() function that implements max in the same way, including with a shuffle=True option. Fun fact: sample(n, shuffle=True) will be equivalent to NetLogo's up-to-n-of. @quaquel I know you hate NetLogo with all your hearth, but sometimes you can learn a lot from them ;).
But that would be separate PR.
I am unsure about using a single keyword for both the number and the percentage, but I won't object to it either. I would change the name, however. max shadows the name of a build-in.
It would be nice to see a quick overview of what the API is now becoming just for clarity.
sample(n, shuffle=True) will be equivalent to NetLogo's up-to-n-of. @quaquel I know you hate NetLogo with all your hearth, but sometimes you can learn a lot from them ;).
I hate the language, but, yes, we can pick up useful ideas and give them a better name. sample is much better than that weird construct with hyphens in the name 😉.
I was thinking
max,limit,ceilingorat_most.
Any suggestions (either these or another)?
I like at_most the best. It conveys that "n" can be arbitrary large, but must the number of returned agents must not match. It also sort of implies that you first apply a filter and then take a sample. And it also makes the rounding clear for fractions. So 1/3 of 5 (1.67) will be 1 agent, otherwise it would be more than 1/3.
So 1/3 of 5 (1.67) will be 1 agent, otherwise it would be more than 1/3.
Currently it does round, do you think it shouldn't?
If its an upper limit I think it should always round down/floor
Difficult one. Because if you describe it as "selecting a fraction" I would expect it to select the closest match.
I think in many practical scenarios the closest selection to the fraction you wanted is most logical.
If we go with at_most, it should round down in the case of fractions. Otherwise, the name and behavior don't match.
Valid argument for "selecting a fraction", but for selecting "at most" 33% I would not expect it to select 40%
If we go with
at_most, it should round down in the case of fractions. Otherwise, the name and behavior don't match.
Exactly. Thats why I think its a good name (if we floor), because people will always have different expectations for "selecting a fraction" with respect to rounding.
I renamed max to at_most, made sure it rounded down, and updated the tests.
PR description is updated, including the usage examples
@projectmesa/maintainers ready to go? (would like to merge myself)