mesa icon indicating copy to clipboard operation
mesa copied to clipboard

Fix issue #2213

Open GittyHarsha opened this issue 1 year ago β€’ 10 comments

Resolves #2213 To allow automatic assignment of unique_id's to newly created agents of a particular model at the same time to ensure not to break existing codes:

I have introduced the following changes in the __init__ method of Agent class: changed first parameter from unique_id to unique_id_or_model: int | Model changed the second parameter from model: Model to model: Model | None= None.

It is backward compatible as both unique_id and model arguments can be given.

This also identifies when only a model is given as an argument to unique_id_or_model parameter and thus assigns the unique_id to the agent automatically

GittyHarsha avatar Aug 20 '24 02:08 GittyHarsha

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
Schelling small πŸ”΅ -0.8% [-1.0%, -0.6%] πŸ”΅ +1.0% [+0.9%, +1.2%]
Schelling large πŸ”΅ -0.8% [-1.0%, -0.5%] πŸ”΅ +0.2% [-0.6%, +0.8%]
WolfSheep small πŸ”΅ -0.1% [-0.9%, +0.8%] πŸ”΅ +0.0% [-0.2%, +0.3%]
WolfSheep large πŸ”΅ -0.3% [-0.6%, -0.1%] πŸ”΅ +0.2% [-0.2%, +0.7%]
BoidFlockers small πŸ”΅ -2.0% [-2.6%, -1.4%] πŸ”΅ +0.6% [+0.0%, +1.2%]
BoidFlockers large πŸ”΅ -1.0% [-1.6%, -0.6%] πŸ”΅ +1.8% [+1.2%, +2.5%]

github-actions[bot] avatar Aug 20 '24 02:08 github-actions[bot]

This is an interesting approach, thanks!

Do you mind if I write some test cases and add them to this PR? Then we can see what edge cases are covered and for which we still have to come up with a solution.

EwoutH avatar Aug 20 '24 06:08 EwoutH

Good stuf. A few comments that might be useful to move this PR forward

  1. please make sure the original message is self contained. So summarize what the PR does. I now have to follow the link to figure it out. Please still include the link for those who want to know more.
  2. As asked in the original issue (q1), the counting methods should move from the model class to the agent class and be a class method.
  3. I will try to look closely at the code later today and think about MESA 2 versus 3 and how to handle this cleanly.

quaquel avatar Aug 20 '24 06:08 quaquel

Do you mind if I write some test cases and add them to this PR? Then we can see what edge cases are covered and for which we still have to come up with a solution.

sure!

GittyHarsha avatar Aug 20 '24 09:08 GittyHarsha

Done! I added 3 test cases, as you can see, currently one of them fails.

What helps is that most people just pass super().__init__(unique_id, model), without the keywords. That all our example models pass is also a good sign.

The test that currently fails is the one that uses all keyword arguments:

super().__init__(unique_id=unique_id, model=model)

Maybe unique_id could be another optional parameter that's None by default, but get's checked if it's passed.

Curious if you can get all three test cases simultaneously working!

EwoutH avatar Aug 26 '24 16:08 EwoutH

@EwoutH here are the code changes I am making:

    def __init__(
        self, unique_id_or_model: int | Model = None, model: Model | None = None, unique_id: int | None = None
    ) -> None:
        """
        Create a new agent.

        Args:
            unique_id_or_model (int | Model, optional): A unique identifier for this agent or the model instance in which the agent exists.
            Defaults to None, meaning `model` and `unique_id` parameters are given as keyword arguments.
            model (Model, optional): The model instance in which the agent exists given the argument `unique_id_or_model` is a unique_id for this agent.
            Defaults to None, meaning only the model instance is given as an argument to `unique_id_or_model`.
            unique_id (int, optional): A unique identifier for this agent. Defaults to None, meaning it is not given a keyword parameter.
        """
        if unique_id is not None and model is not None:
            self.unique_id = unique_id
            self.model = model
        else:
            if isinstance(unique_id_or_model, int):
                if model is None:
                    raise RuntimeError('The second argument must be a model instance')
                else:
                    self.unique_id = unique_id_or_model 
                    self.model = model 
            elif unique_id is not None:
                self.unique_id = unique_id
                self.model = unique_id_or_model
            else:
                self.unique_id = unique_id_or_model.next_id()
                self.model = unique_id_or_model
                

        self.pos: Position | None = None

All tests are passing expect the tests in tests\test_time.py. because here the parameter unique_id_or_model is being passed a string

self = <tests.test_time.MockAgent object at 0x000001F31840B4D0>, unique_id_or_model = 'A', model = <tests.test_time.MockModel object at 0x000001F31840B440>, unique_id = None

My code changes checks only for instance int, thus it then tries to execute self.unique_id = unique_id_or_model.next_id() and thus gives error: AttributeError: 'str' object has no attribute 'next_id'

So, can strings be passed as arguments for a unique_id? So, making changes : if isinstance(unique_id_or_model, (int, str)): should resolve it? or are there any other data types that can be passed to be assigned as unique_id which will be implicitly converted to int?

To solve this I also tried to do if not isinstance(unique_id_or_model, Model) but this doesn't work as Model cannot be imported during run time and if imported causes cyclic dependency

I can refactor the code to not include any type checking for unique_id_or_model but this will miss many cases like when both the arguments are of type Model.

GittyHarsha avatar Aug 26 '24 20:08 GittyHarsha

Also Instead of adding another optional paramter unique_id, just changing the name unique_id_or_model to unique_id passes all the tests, but is semantically incorrect because only a single argument of type Model will be referenced by unique_id paramter,

GittyHarsha avatar Aug 26 '24 20:08 GittyHarsha

I don't see a simple solution for supporting both current style and new style Agents. Currently, unique_id and model are arguments, not keyword arguments. It might be possible to switch to Agent.__init__(first_arg, *args) and just check whether there is anything in *args. Personally, I am more inclined to just have a hard break with MESA 3. It is easy to document and easy to fix in existing models. It would also make the code here much simpler.

For my point 2 above, see the SimulationEvent code on how to handle the counting nicely. The code there is the recommended way of counting instances of a class on StackOverflow. This can be implemented in a backward compatibly way and then just let Model.next_id raise a DeprecationWarning.

quaquel avatar Aug 26 '24 20:08 quaquel

I thought it would break a lot more models, but since the name of the variable in your subclass are defined by yourself and the variables names only matter when calling super(), the impact is way smaller.

The fact that all regular example models passed makes it a lot less impactful to make a breaking change here.

EwoutH avatar Aug 26 '24 20:08 EwoutH

    _ids = itertools.count()

    def __init__(
       self, unique_id_or_model: int | Model | None = None, fallback: Model | None = None, **kwargs
    ) -> None:

        if "unique_id" in kwargs:
            if "model" in kwargs:
                self.unique_id = kwargs["unique_id"]
                self.model = kwargs["model"]
            else:
                self.unique_id =kwargs["unique_id"]
                self.model = unique_id_or_model
        elif "model" in kwargs:
            self.unique_id = unique_id_or_model
            self.model = kwargs["model"]
        elif fallback is None:
            self.unique_id = next(self._ids)
            self.model = unique_id_or_model
        else:
            self.unique_id = unique_id_or_model 
            self.model = fallback

this passes all the tests.

GittyHarsha avatar Aug 27 '24 14:08 GittyHarsha

Do you mind if I make some modifications on your branch? You made a great start but I would like to tidy it up myself.

EwoutH avatar Aug 28 '24 17:08 EwoutH

Sure! No problem. I learnt a lot trying to solve this issue, especially as someone new to open source

GittyHarsha avatar Aug 28 '24 18:08 GittyHarsha

@projectmesa/maintainers Ready for review!

EwoutH avatar Aug 29 '24 18:08 EwoutH

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small πŸ”΄ +81.4% [+79.2%, +83.7%] πŸ”΅ -0.7% [-0.8%, -0.5%]
BoltzmannWealth large πŸ”΄ +74.7% [+33.5%, +125.5%] πŸ”΅ -0.2% [-2.7%, +2.4%]
Schelling small πŸ”΄ +44.1% [+43.7%, +44.6%] πŸ”΅ -0.6% [-0.9%, -0.4%]
Schelling large πŸ”΄ +52.6% [+51.8%, +53.4%] πŸ”΅ +0.8% [-0.1%, +1.9%]
WolfSheep small πŸ”΄ +77.6% [+75.7%, +79.4%] πŸ”΄ +15.2% [+14.8%, +15.6%]
WolfSheep large πŸ”΄ +76.0% [+74.9%, +77.0%] πŸ”΄ +23.7% [+22.7%, +24.5%]
BoidFlockers small πŸ”΄ +38.5% [+37.7%, +39.3%] πŸ”΅ -0.7% [-1.3%, -0.1%]
BoidFlockers large πŸ”΄ +40.5% [+39.2%, +41.6%] πŸ”΅ -0.9% [-1.5%, -0.4%]

github-actions[bot] avatar Aug 29 '24 19:08 github-actions[bot]

The benchmarks are slower because they still use the old structure which creates a huge amount of warnings.

EwoutH avatar Aug 29 '24 19:08 EwoutH

This might actually be the largest breaking change we make in Mesa 3.0. I hope it's worth it.

There was some beauty in explicitly knowing that you assigned each agent and unique ID. It was one of the first things you had to teach. It might get used a lot less when that isn't automatic anymore.


I still dislike having the example models in two places. It's maintenance intensive.

EwoutH avatar Aug 29 '24 19:08 EwoutH

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small πŸ”΄ +32.7% [+30.4%, +34.8%] πŸ”΅ +1.8% [+1.6%, +1.9%]
BoltzmannWealth large πŸ”΅ +25.3% [-9.8%, +66.6%] πŸ”΅ +0.8% [-1.5%, +3.4%]
Schelling small πŸ”΄ +9.9% [+9.6%, +10.3%] πŸ”΅ +1.0% [+0.7%, +1.2%]
Schelling large πŸ”΄ +11.0% [+10.2%, +11.8%] πŸ”΅ +2.2% [-0.3%, +4.7%]
WolfSheep small πŸ”΄ +15.6% [+14.5%, +16.8%] πŸ”΄ +5.6% [+5.3%, +5.9%]
WolfSheep large πŸ”΄ +15.1% [+14.1%, +16.5%] πŸ”΄ +8.7% [+5.6%, +12.4%]
BoidFlockers small πŸ”΄ +15.4% [+14.9%, +15.9%] πŸ”΅ -0.0% [-0.7%, +0.7%]
BoidFlockers large πŸ”΄ +16.1% [+15.3%, +17.1%] πŸ”΅ +1.4% [+0.7%, +2.1%]

github-actions[bot] avatar Aug 29 '24 19:08 github-actions[bot]

A few comments from my side (open for discussions):

  1. In the past I used uuid module to create unique agent ids, such as here. In this way we don't need a counter in Model, so we can get rid of model.next_id() as well as model.current_id.

  2. What if users want to explicitly set agent ids by themselves? For instance, I may set agent ids to be their unique names. It might be better to keep the unique_id parameter of the Agent class optional.

    • If unique_id is provided, agent.unique_id is set accordingly. This is current behavior.
    • Otherwise (i.e., unique_id = None), as a default behavior, Mesa creates a unique agent id automatically.
  3. It may be useful to check for uniqueness of agent ids. For example, we could have a model._agent_id_set attribute that is a set. While creating agents, we check whether new agent id is already taken and raise exception.

wang-boyu avatar Aug 29 '24 21:08 wang-boyu

A few comments from my side (open for discussions):

  1. In the past I used uuid module to create unique agent ids, such as here. In this way we don't need a counter in Model, so we can get rid of model.next_id() as well as model.current_id.

Regardless of how it is done, it should move the Agent class.

I am not familiar with uuid but is that not overkill for the function it has in MESA? The basic function of unique_id is to serve as a unique identifier for indexing agents when collecting agent level data. Why would we need a 128-bit identifier for this? A possibly convenient side-effect of counting is that you know how many agents have been created over the runtime of the model.

  1. What if users want to explicitly set agent ids by themselves? For instance, I may set agent ids to be their unique names. It might be better to keep the unique_id parameter of the Agent class optional.
    • If unique_id is provided, agent.unique_id is set accordingly. This is current behavior.
    • Otherwise (i.e., unique_id = None), as a default behavior, Mesa creates a unique agent id automatically.

unique_id is necessary for the functioning of MESA (e.g., data collection). So, analogous to step, we should do it automatically. If the user wants her own id, they can do so with their own attribute.

  1. It may be useful to check for uniqueness of agent ids. For example, we could have a model._agent_id_set attribute that is a set. While creating agents, we check whether new agent id is already taken and raise exception.

If you make it part of MESA, uniqueness is guaranteed by construction. Messing with internal attributes of classes is at your own peril. So I don't see a need to be so defensive in how we code it up. If we want to be defensive, you could move unique_id into _unique_id and have a property for unique_id and a setter that raises some appropriate exception.

quaquel avatar Aug 30 '24 05:08 quaquel

First of all lets not again discuss this PR to its death. Great work by @GittyHarsha and @EwoutH. I think making it automatic by default is very good and the main motivation for this PR. Everything else can be discussed separately.

  1. In the past I used uuid module to create unique agent ids, such as here. In this way we don't need a counter in Model, so we can get rid of model.next_id() as well as model.current_id.

I did the same, but I think its overkill. UUID is often used to make the ids universally unique and to prevent guessing the next id. We don't need either of those. As a side effect a counter even conveys additional information, for example the order in which agents were created. Its also much easier for humans to track a number than a uuid.

  1. What if users want to explicitly set agent ids by themselves? For instance, I may set agent ids to be their unique names. It might be better to keep the unique_id parameter of the Agent class optional.

I think its unnecessary. For us maintainers we now know for certain that the unique_id is truly unique. Before this PR I was always reluctant to fully rely on this. For users I don't see the benefit of overwriting unique_id. E.g. If you already have the name just use it as name, not id. And more importantly if users want to use their own id, they can still simply overwrite the unique_id attribute.

  1. It may be useful to check for uniqueness of agent ids. For example, we could have a model._agent_id_set attribute that is a set. While creating agents, we check whether new agent id is already taken and raise exception.

Again, we don't need this if unique_id is not, but automatically assigned.

Regardless of how it is done, it should move the Agent class.

Partly agree. At first I thought we should move it to the mesa or mesa.utils namespace and make it generally available. But its nice that it only counts agents. But still I don't think it should be part of the agent class, but the agent namespace. And we can just expose an instance of itertools.count

Corvince avatar Aug 30 '24 07:08 Corvince

Partly agree. At first I thought we should move it to the mesa or mesa.utils namespace and make it generally available. But its nice that it only counts agents. But still I don't think it should be part of the agent class, but the agent namespace. And we can just expose an instance of itertools.count

I would be fine with that as well. In SimulationEvent, I used a class based approach based on the most upvoted StackOverflow answer, but I am fine either way. The main point for me is that it is not part of the behavior of Model so it should be removed from there (here or in a next PR).

quaquel avatar Aug 30 '24 07:08 quaquel

I just want to add: I try to keep my PRs small in scope and if possible even atomic.

I know there's benefit into doing things right once, but sometimes that makes PRs unmanageable big, either code wise or, more often, conceptually (which is why we don't have a new datacollector yet).

This PR modifies one thing: Automatically increasing the unique_id. While I agree that should be ideally done in the Agent, it would make this PR way too big. In two ways:

  • It introduces a large conceptual change that unique_ids are not per Model, but per Agent superclass.
  • It breaks a lot of tests that rely on the Agent count starting at 0 when a new model is initialized. So if you define multiple models (like you do in test classes) is messes a lot of stuff up.

image

So again, I'm not fundamentally against any of these changes. It just are more changes that are better fitted in a new PR.

EwoutH avatar Aug 30 '24 08:08 EwoutH

Thanks for your responses @quaquel @Corvince! Some further comments -

Regarding implementation details of unique_id

I don't really care if it's using uuid or a counter. I used uuid because it's less work for me. In this PR, if we're using a counter, is it possible to remove model.next_id() and model.current_id, or make them private? There appear to be implementation details of unique_id and shouldn't be exposed to users. In addition, they may easily cause breaking changes in the future if we'd like to change anything.

A possibly convenient side-effect of counting is that you know how many agents have been created over the runtime of the model.

I wouldn't count on these because agents can be removed/destroyed anytime. We then need to differentiate how many agents were created and how many agents are still alive.

As a side effect a counter even conveys additional information, for example the order in which agents were created.

unique_id are supposed to be unique identifiers, nothing more, nothing less. I don't really like the ideas of having side effects that rely on implementation details. Sorry : )

Regardless of how it is done, it should move the Agent class.

I agree. But if it's using a counter, then the counter has to be stored somewhere outside individual agents. Keeping it inside a model makes sense. However, this is also why I think using uuid is better - it eliminates the need of maintaining a counter by ourselves.

Regarding user-settable unique_id

The central question is: why are users not allowed to use their own unique ids?

My suggestions are:

  • By default, we (Mesa) create unique ids so users don't have to worry about them.
  • Advanced users can use their own unique ids. Mesa validates uniqueness to make sure they don't screw up.

As an example, Mesa-Geo sets unique_id of agents when batch creating geo-agents from geodataframe: https://github.com/projectmesa/mesa-geo/blob/9c3196c932be94fdc6ead4a599740445c56d63bd/mesa_geo/geoagent.py#L198-L200

wang-boyu avatar Aug 30 '24 11:08 wang-boyu

I ran a quick performance test on just 1000 agents, UUID vs. itertools.count

Screenshot 2024-08-30 at 13 44 55

I just don't see the need for UUID in this context, and it comes with a real performance difference.

In this PR, if we're using a counter, is it possible to remove model.next_id() and model.current_id, or make them private? There appear to be implementation details of unique_id and shouldn't be exposed to users.

I agree but there is @EwoutH point on how this causes all tests to fail. I want to take a closer look myself at this later today. I think it is possible to move everything into Agent, while raising deprecation warnings from the old structure.

I agree. But if it's using a counter, then the counter has to be stored somewhere outside individual agents.

In my screenshot, _ids is a class attribute, so it is not stored inside the individual agents. Given that it is part of the behavior of the Agent class, it makes sense it is either done at the class level (my preference) or at least part of the Agent namespace (as suggested by @Corvince).

The central question is: why are users not allowed to use their own unique ids?

This is perfectly analogous to the discussion on automatically counting steps. Users can still do this, but just not by using the unique_id attribute. MESA relies on unique_id for agent data collection, so it makes sense that MESA internally handles this correctly. As a minor point, it was deeply confusing to many of my students why they had to supply the unique_id themselves.

quaquel avatar Aug 30 '24 11:08 quaquel

@EwoutH, you might want to check #2260. This is how I would go about solving the issue. I don't have any existing tests failing, while unique_id counting is done inside the Agent class. #2260 is, of course, not complete. For example, tests are still needed, and I would add a deprecation warning inside Model. Am I missing something important? I fail to understand what is so complicated about any of this.

quaquel avatar Aug 30 '24 12:08 quaquel

@Corvince you’re spot on. That’s what I discovered when I tried to move it. That’s where all the tests were failing. That’s why I didn’t move it.

And I explained this I think 5 times now.

Sorry guys, but I’m getting a bit done with the β€œwhataboutism” this way we never get anything done.

We’re all experienced developers. We have to trust each other and have to give each other a bit of room. If you really care that much open a follow-up PR yourself.

EwoutH avatar Aug 30 '24 13:08 EwoutH

Just for the record, my comments are not blocking. I remembered we discussed this before (correct me if I'm wrong), if at least two maintainers approve a PR then it's considered mergable.

wang-boyu avatar Aug 30 '24 14:08 wang-boyu

I’m resigning as owner of this specific effort. Feel free to work further on this branch.

EwoutH avatar Aug 30 '24 14:08 EwoutH

unique_id are supposed to be unique identifiers, nothing more, nothing less. I don't really like the ideas of having side effects that rely on implementation details. Sorry : )

Isn't this contradictory to the rest of your argument? ;)

If this is the case, and I basically agree with this, then why make so much fuss about being able to change them? If it's just for identification, one can be happy that it happens automatically, no need to assign custom names or anything else to them.

What I meant with positive side effects that you can eyeball some information from them. Nothing to rely on, but a good estimate about the age of the agent and if you are tracking the same agent in multiple places

Corvince avatar Aug 30 '24 15:08 Corvince

closing as solved via #2260

quaquel avatar Sep 04 '24 13:09 quaquel