mesa
mesa copied to clipboard
Aggegrated agent metric in DataCollection, graph in ChartModule
Aggegrated agent variable
Currently it's not possible to quickly get a aggegrated metric of an agent variable. This PR adds a method to the DataCollector
class called get_agent_metric
that allows to quickly get a single value that describes the agent variable based on a stastistic.
By default, it takes the mean of the value of all agent's values for that variable. It always reports the variable in the current time step. The function supports all of statistics functions, as well as the built-in min()
, max()
, sum()
and len()
functions.
To support this:
-
statistics
is imported - Adds
agent_attr_index
dictionary, which list the place of eachreporter
in the_agent_records
dictionary - Adds
self.agent_name_index
, which can be used to lookup thereporter
for each input variablename
Example
A model called model1
is created, with agents that have an agent_reporter
in datacollector
variable called "Neighbours"
self.datacollector = DataCollector(
model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
agent_reporters={"Neighbours": "neighbours"},
)
The new get_agent_metric()
function can now be used to get an aggerate level statistic of the number of neighbours of the agents:
model1.datacollector.get_agent_metric("Neighbours")
0.8984375
model1.datacollector.get_agent_metric("Neighbours", "min")
0
model1.datacollector.get_agent_metric("Neighbours", "median")
0.0
model1.datacollector.get_agent_metric("Neighbours", "max")
3
Plotting agent variables
The ChartModule
is also updated to support displaying agent variables. If it can't find a variable in the model variables, it checks if it is present in the agent variables, and if so, adds it to the chart.
Example In a Game of Life model I build the DataCollector looks like this:
self.datacollector = DataCollector(
model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
agent_reporters={"Neighbours": "neighbours"},
)
The server contains two charts, one with the Agents
, which is a model variable, and one with Neighbours
, an agent variable.
chart1 = ChartModule([{"Label": "Agents", "Color": "Black"}], data_collector_name="datacollector")
chart2 = ChartModule([{"Label": "Neighbours", "Color": "Black"}], data_collector_name="datacollector")
server = ModularServer(LifeModel, [grid, chart1, chart2], "Game of Life", {"p": 0.12, "width": 40, "height": 40})
On the main branch, only the first chart is displayed correctly. On the second, both are.
@tpike3, @rht and others, I would love your feedback on this PR! Please consider performance, the naming of variables and functions and API stability. Also please let me know if (and where) tests and documentation should be added.
Codecov Report
Merging #1145 (51df4fe) into main (4a79705) will decrease coverage by
1.17%
. The diff coverage is25.00%
.
@@ Coverage Diff @@
## main #1145 +/- ##
==========================================
- Coverage 89.30% 88.13% -1.18%
==========================================
Files 19 19
Lines 1253 1289 +36
Branches 256 259 +3
==========================================
+ Hits 1119 1136 +17
- Misses 98 116 +18
- Partials 36 37 +1
Impacted Files | Coverage Δ | |
---|---|---|
mesa/visualization/modules/ChartVisualization.py | 70.96% <20.00%> (-20.70%) |
:arrow_down: |
mesa/datacollection.py | 88.11% <28.57%> (-9.59%) |
:arrow_down: |
mesa/space.py | 94.91% <0.00%> (-1.00%) |
:arrow_down: |
mesa/batchrunner.py | 92.28% <0.00%> (+0.72%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4a79705...51df4fe. Read the comment docs.
@tpike3 Do you have time to review this PR?
@rht Any more comments?
-
self.agent_name_index
is redundant withself.agent_reporters
. - I find the extra
agent_attr_index
construct to be unnecessary. You can define aggregate measure of agent variables within the framework of the existing model-level data collection.
self.agent_name_index
is redundant withself.agent_reporters
.
Good catch, can't believe I missed that. I found it already weird that there wasn't such a dictionary, but there was. I fixed it in 0d1aeef.
- I find the extra
agent_attr_index
construct to be unnecessary. You can define aggregate measure of agent variables within the framework of the existing model-level data collection.
The dictionary is created to keep track of which metric is collected where in the list of _agent_records
. That information isn't really easily viewable for the user unfortunately, of course you can look up deep in the code where each number in [1, 2, 4.5, 6.4, 3.8]
stands for, but that should be easier or handled by the back-end, like this approach does.
Anyways I don't think it makes a big performance impact and it does make the code a bit more resilient if agent_reporters are defined in a weird way.
But if you suggest an other implementation I'm open to incorporate it!
I wasn't referring to _agent_records
. i was referring to storing the values in model_vars
. The aggregated agent metric is a model-wide measure, a summary of individual agent properties.
@Ewout, generically, I think this is a good idea, but it is a hard how to implement
1st a unhelpful philosophical rabbit hole: This is pretty profound. To put in my own terms, what is the right set up to optimize user ease, as the general population becomes more technically literate. This is a constant issue for me right now and speaks to @EwoutH's point what is the set up allow users to easily and intuitively see key parts of the model.
2nd some thoughts to hopefully be helpful:
@EwoutH as I didn't get a chance to play with it and really understand, But, to @rht's point can you describe the difference between model_vars
and agent_attr_index
. Couldn't you just put the metrics against the ```model_vars" or even the dataframe (which has some interesting ease of use and cost dynamics)?
To a specific question, the testing would go in the data_collector
Hope this helps.
Looked a bit more into it. They are indeed identical, but currently model_vars
is used for model reporter variables, and agent_attr_index
for agent reporter variables. I think it's better to keep them separate, just for the case in which there is a model variable and agent variable with the same name that are both collected.
I renamed it to agent_vars
however, to make their similarity more clear.
So with my current skill set, I think this is the best implementation I can do. Then the question this, is this good enough in terms of performance and maintainability? If so, I can add tests and update the docs further.
If not, @rht would you be open to re-implementing this functionality from a clean sheet?
Any aggregate metric, by definition, is a model-level variable. The examples you showed in https://github.com/projectmesa/mesa/pull/1145#issue-1111995305 can be put model_vars
.
Avoiding data collection key name collision is a separate problem. If any, calling it agent_vars
is misleading, because once again, those are model-level vars.
You should stick to the existing API whenever possible. Adding more machinery will cause the library to be more complex and harder to learn.
I would do something like this:
def get_neighbors_min(model):
neighbors = model.datacollector.get_last_agent_report("Neighbors")
return min(neighbors)
# Later on in the agent reporter initialization
model_reporters={"neighbors_min": get_neighbors_min, ...}
This way, the user is the one responsible for naming the model-level var, and there is no key collision at all.
Thanks for your comment, I now understand your issue.
The current architecture is as follow:
- DataCollector collects a variable from all agents each timestep, keeping all values.
- The
get_agent_metric
aggregates the values from all agents it to a single value. - ChartVisualization plots this single value each timestep.
What you suggesting is merging step 1 and 2, if I understand correctly. While this has the advantage it can simplify code and reduce the amount of information stored, it does throw away a lot of data that could be analysed afterwards.
No data are thrown away. See my example. I took the agent-level vars from an existing, separately-defined agent reporter.
@rht @tpike3 @jackiekazil Maybe we could give this PR/idea another spin. I think the main questions are:
- Do we want aggregated agent metrics (for plotting) in Mesa?
- If so, how would a clean implementation look like, that (preferably) doesn’t break backwards compatibility?
Following our discussion in the dev meeting earlier today, you might need to consider the case where different types of agents may have different attributes. As discussed, the implemented interface could be used as a default where all types of agents are assumed to share a common attribute.
Another major concern that I had is how to differentiate agent_reporters
from model_reporters
? How do we tell the users when to use agent_reporters
vs. when to use the other?
Without this PR what I'll do would probably look very much like what was mentioned in https://github.com/projectmesa/mesa/pull/1145#issuecomment-1031524167.
As an alternative yet similar example:
def get_min_neighbors(model):
return min([getattr(agent, "neighbors") for agent in model.schedule.agents]) # or model.grid.agents or other similar places
and use this in model_reporters
.
The question is, is this generic enough to be provided as an API to the users? For instance we can have:
def get_agent_metric(model, attr_name, metric="mean"):
values = [getattr(agent, attr_name) for agent in model.schedule.agents]
if metric in ["min", "max", "sum", "len"]:
# similar to what was implemented in this PR
result = ...
else:
result = ...
return result
so that the users can do something like:
from functools import partial
model_reporters={
"neighbors_min": partial(get_agent_metric, attr_name="neighbors", metric="min"),
...
}
Personally I don't think this is really needed, since the users can fairly easily define their own functions.
How about the agent_reporters
like in this PR, i.e.
self.data_collector = DataCollector(
model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
agent_reporters={"Neighbours": "neighbours"},
)
vs.
self.data_collector = DataCollector(
model_reporters={
"Agents": lambda m: m.schedule.get_agent_count(),
"Neighbours": get_min_neighbors,
}
)
Again I don't really see the need to introduce agent_reporters
here.
On a second thought, it might be useful when the users need to define lots of similar functions, such as:
self.data_collector = DataCollector(
model_reporters={
"Agents": lambda m: m.schedule.get_agent_count(),
"Min Neighbours": get_min_neighbors,
"Mean Neighbours": get_mean_neighbors,
"Max Neighbours": get_max_neighbors,
}
)
In this case it could be easier for the users to have a common interface such as agent_reporters
or the get_agent_metric
function mentioned previously, so that they don't have to rewrite lots of short functions. Sorry that I missed this point which was mentioned in the PR.
On a second thought, it might be useful when the users need to define lots of similar functions, such as:
self.data_collector = DataCollector( model_reporters={ "Agents": lambda m: m.schedule.get_agent_count(), "Min Neighbours": get_min_neighbors, "Mean Neighbours": get_mean_neighbors, "Max Neighbours": get_max_neighbors, } )
This is exactly what I see students (and myself, sometimes) do all the time. The main use case of this feature I see, is that you want to collect all the agent data for later proper statistical analysis, but you also want some quick values for eye-ball validation and visualisation.
If I want to do that with the current datacollector possibilities, I have to define both agent and model reporters, or write custom code to transform the agent data to the thing I want.
Also, I think that there should be a really easy way to plot a general statistic like the mean of an agent variable with real time visualisation. Heck, NetLogo does this for 20+ years.
Maybe some of the Solara stuff leapfrogs this, but those use cases should be included in my opinion:
- Get quick aggerate values
- Plot aggerate metrics in real time
(both while still collecting full agent data for proper analysis)