framework icon indicating copy to clipboard operation
framework copied to clipboard

[9.x] Add opt-in support for an identity map to Eloquent

Open ollieread opened this issue 1 year ago • 22 comments

This PR adds identity map support to Eloquent as an opt-in feature controlled by Model::$identifiable.

Summary

Models that are marked as identifiable will have an identity, provided by Model::getModelIdentifier(), which by default will use the following format:

connectionName:className:key

When an instance of the model is created, it is stored in the IdentityManager. Every attempt thereafter to retrieve the same model from the database will give the same object.

This means that in the following example, all 3 variables contain the same instance.

$user1 = auth()->user();
$user2 = User::find(auth()->id());
$user3 = User::find(auth()->id());

Usage

Finding

Any calls to find(), findOrFail() or findOrNew() on a model that is identifiable, will skip the query if a model has already been created using the provided id.

Calls to findMany() will skip any ids that have already been used, only querying ones that are not present in the cache.

The query is skipped if there are no where clauses, joins or having statements.

If you wish to force the query, you can call refreshIdentityMap() on the query builder instance. If you wish to skip the query on a builder instance where refreshIdentityMap() has been called, you can call useIdentityMap().

Hydrating

When the query builder attempts to create a new instance of an identifiable, with a key that matches an already existing instance of the model, the existing instance will be used.

If the model is using timestamps, and the returned attributes are newer, the attributes on the existing instance will be updated, but will retain any changes you'd previously made.

BelongsTo

If a belongs to relationship is loaded (not belongs to many) without constraints and without refreshIdentityMap() being called, the query will skip any model instances that already exist.

Flushing

If you wish to flush the cached models, call flushModels() on an instance of IdentityManager, or on the Identity facade.

How

The IdentityManager stores an array containing all existing model instances keyed by their identity.

Why

It's very easy to end up with multiple versions of the same model, meaning that updates on one aren't persisted to others.

This aims to reduce the number of models created, help limit unnecessary queries, and allow for consistent model interaction. It doesn't matter where in your code you're dealing with user 1, any changes made during a request will persist across all instances.

ollieread avatar Aug 17 '22 21:08 ollieread

The idea is great but I'm a little afraid that this will cause a whole new range of problems. Take for example concurrent requests. During an app lifecycle models are retrieved again sometimes so their state is refreshed. If a model's state is changed during a different concurrent request that state won't be reflected anymore in the original request and can cause overrides to happen. I might be a bit overly paranoid here but this is something to keep in mind as it's a big change. During long running requests this can be a real problem. And then I don't even know how this will affect long running processes (like Octane).

driesvints avatar Aug 19 '22 06:08 driesvints

The idea is great but I'm a little afraid that this will cause a whole new range of problems. Take for example concurrent requests. During an app lifecycle models are retrieved again sometimes so their state is refreshed. If a model's state is changed during a different concurrent request that state won't be reflected anymore in the original request and can cause overrides to happen. I might be a bit overly paranoid here but this is something to keep in mind as it's a big change. During long running requests this can be a real problem. And then I don't even know how this will affect long running processes (like Octane).

I don't think this would introduce a new issue with concurrent requests, as that problem exists now. Models are still refreshable and they are still retrievable, through Model::refresh() and Builder::refreshIdentityMap().

That being said, there are definitely going to be some "obstacles" to overcome, which is why I made this an opt-in feature, but I think the pros far outweigh the possible cons. It will go a long way towards reducing queries, as well as the number of objects in memory.

Doctrine and many other ORMs use this pattern, so it's not completely out there as far as an approach goes.

I had originally built this as a package, but it meant overriding a lot of the core parts of Eloquent, which just increased the chance that it wouldn't be compatible with other packages/updates. I'm happy to look into any concerns people may have and try and disprove, or at the very least, provide a workaround/some sort of safety if necessary.

ollieread avatar Aug 19 '22 07:08 ollieread

If it's opt-in then it's probably fine 👍

driesvints avatar Aug 19 '22 07:08 driesvints

If it's opt-in then it's probably fine 👍

Yeah, all you have to do for this to work for a model is add a property with the value true.

protected $identifiable = true;

It's easy to do if you want to use it, but not easy enough to accidentally do.

I think it's mostly finished now, I just need to write tests.

ollieread avatar Aug 19 '22 07:08 ollieread

If the same instance of a model is returned then I'd be concerned about one area of code affecting state for another.

When you fetch models there's an implied contract that they'll have a "fresh" set of data, and unless you call $model->save () that any changes are not persisted.

If this is opt-in on a per-model basis but also support an as-needed opt-out like Model::findFresh($id); then that'd resolve my concern.

EDIT: just to be clear I'm talking about one area of code changing a model and affecting other code later in the request, not talking about concurrent requests like @driesvints

mattdinthehouse avatar Aug 19 '22 10:08 mattdinthehouse

If this is opt-in on a per-model basis but also support an as-needed opt-out like Model::findFresh($id); then that'd resolve my concern.

It's entirely opt-in with identifiable, and when you query you just use refreshIdentityMap().

If the same instance of a model is returned then I'd be concerned about one area of code affecting state for another.

When you fetch models there's an implied contract that they'll have a "fresh" set of data, and unless you call $model->save () that any changes are not persisted.

I'm not sure I see the issue here. If you change a model, the model is changed. Models are not for storing temporary state.

ollieread avatar Aug 19 '22 10:08 ollieread

It's entirely opt-in with identifiable, and when you query you just use refreshIdentityMap(). ... I'm not sure I see the issue here. If you change a model, the model is changed. Models are not for storing temporary state.

In a package context I'd have to call refreshIdentityMap() at the start of most of my functions cos I can't guarantee that bad calling code hasn't modified a model I'm about to save. The bug would be in the bad calling code but I'd have to respond to the issue tracker.

I'm not sure I see the purpose of this over a traditional context-aware object cache.

mattdinthehouse avatar Aug 19 '22 10:08 mattdinthehouse

In a package context I'd have to call refreshIdentityMap() at the start of most of my functions cos I can't guarantee that bad calling code hasn't modified a model I'm about to save. The bug would be in the bad calling code but I'd have to respond to the issue tracker.

Calling refreshIdentityMap on every query within your package is an extremely bad move and one that would be very ill-advised. In doing so, you'd invalidate the instance their code may be using, just on the off chance that your code saves a change they made.

I'm not sure I see the purpose of this over a traditional context-aware object cache.

Its purpose is to remove redundant database queries and memory usages for objects that represent the same data while steering towards the idea of "one row -> one object". It's not a new approach, not by a long-shot, but it's favoured approach for dealing with object representations of the database.

ollieread avatar Aug 19 '22 11:08 ollieread

Calling refreshIdentityMap on every query within your package is an extremely bad move and one that would be very ill-advised. In doing so, you'd invalidate the instance their code may be using, just on the off chance that your code saves a change they made.

That's kinda my point - this change breaks the implied contract that when I fetch a model I'll get it with fresh uncorrupted data, and I can do whatever I want without affecting code in another scope.

Its purpose is to remove redundant database queries and memory usages for objects that represent the same data while steering towards the idea of "one row -> one object". It's not a new approach, not by a long-shot, but it's favoured approach for dealing with object representations of the database.

I'd prefer to implement this in a context-aware way. Maybe this approach would be better off in a package? Otherwise it's just a narrow and context-unaware implementation of an object cache.

mattdinthehouse avatar Aug 19 '22 11:08 mattdinthehouse

I think I'm doing a bad job of explaining this right now, and I don't want to fill this PR with a lengthy conversation about the identity map architectural pattern and its pros and cons @mattdinthehouse, but I'm happy to discuss it with you in further detail through some other medium if you'd like?

ollieread avatar Aug 19 '22 13:08 ollieread

Re: Octane, wouldn't clearing the identity map cache before serving any request do the trick?

mateusjatenee avatar Aug 19 '22 15:08 mateusjatenee

Re: Octane, wouldn't clearing the identity map cache before serving any request do the trick?

Actually, the identity map would be amazing with octane. It would need some sort of mechanism built in for things to expire, but, imagine the queries that would be just plain skipped if multiple requests are being served the same object!

ollieread avatar Aug 19 '22 16:08 ollieread

I'm so glad to see there is an appetite for this because I've wanted this for sooo long in Laravel. It would reduce number of queries and memory requirements, models will use the same instance where possible.

@ollieread How well does this currently integrate with eager loading relations? As long as they use the basic find* methods then presumably it would use the identity map? So seems the morph relations won't be optimised to use it currently.

Some other things to consider:

  • soft deletes/default constraints - how well this integrates with models that use soft deletes - because pretty much all queries by default will add where deleted_at is null the identity map which means it can't be used because it adds a where? Or any other model that adds default query constraints - that would be unfortunate to not make use of the identity map.
  • extra bindings: select, orderBy, groupBy, from, order, union - these seem excluded from the current list to see if a model supports identity map - this could cause issues with model ordering, custom addSelect columns, grouping, etc.

As you mentioned, Doctrine uses the same pattern as does other ORMs use this same pattern, so it's great to see there is support for this feature.

garygreen avatar Aug 19 '22 19:08 garygreen

I'm so glad to see there is an appetite for this because I've wanted this for sooo long in Laravel. It would reduce number of queries and memory requirements, models will use the same instance where possible.

@ollieread How well does this currently integrate with eager loading relations? As long as they use the basic find* methods then presumably it would use the identity map? So seems the morph relations won't be optimised to use it currently.

Some other things to consider:

  • soft deletes/default constraints - how well this integrates with models that use soft deletes - because pretty much all queries by default will add where deleted_at is null the identity map which means it can't be used because it adds a where? Or any other model that adds default query constraints - that would be unfortunate to not make use of the identity map.
  • extra bindings: select, orderBy, groupBy, from, order, union - these seem excluded from the current list to see if a model supports identity map - this could cause issues with model ordering, custom addSelect columns, grouping, etc.

As you mentioned, Doctrine uses the same pattern as does other ORMs use this same pattern, so it's great to see there is support for this feature.

There's definitely room to hone this, but everything here is based on my original package, which was limited in that it was a package, but if we're going with first-party support, which is what I'm going for her, there's room to perfect it.

The problem with excluding some constraints and not others is that it will start getting really complex, having to inspect combinations and even columns/values to know whether or not to load it. I can definitely look into it and see if I can make it more optimal, but the important bit is that even if the query is ran, you'll only ever have one instance for a given row.

ollieread avatar Aug 19 '22 23:08 ollieread

Big change with small benefit... This is a form of caching and caching is very hard to get right. Sounds more like an app / domain problem than a tech / framework one

deleugpn avatar Aug 20 '22 08:08 deleugpn

This is a form of caching and caching is very hard to get right

It's only "technically" caching because it stores objects that it uses later on. The use of this, its intentions and its designs aren't close to caching.

Sounds more like an app / domain problem than a tech / framework one

It is absolutely an app/domain problem; hence it being opt-in, so the app can decide which "entities" should have unique identities (spoiler: It's pretty much all of them).

What this opt-in feature does is make it so there can only be one instance for a given row of data from the database to avoid the huge number of issues we have where multiple instances for the same data will overwrite each other, and queries are just stacked up to load things that already exist. Identity maps solve an architectural problem that has existed for almost as long as OOP has.

ollieread avatar Aug 20 '22 08:08 ollieread

Big change with small benefit

Also, since it's opt-in, it's literally no change.

ollieread avatar Aug 20 '22 09:08 ollieread

Also, since it's opt-in, it's literally no change.

It's literally 516 more lines of code that anybody interested in contributing and/or understanding Eloquent in the future wil have to go through

deleugpn avatar Aug 20 '22 09:08 deleugpn

IMO this is a good feature that needs to be introduced with moderation.

I know Rails added this at some point, and then removed it because of some problems that came up. As an opt-in, lightweight feature that only handles basic use cases, I think it's good -- handling more complex cases will probably take some trial and error IMO

mateusjatenee avatar Aug 20 '22 14:08 mateusjatenee

What's the status here? If right now no work is being done on this it might be best to resubmit at a later time.

driesvints avatar Sep 13 '22 08:09 driesvints

Sorry @driesvints, I need to write the tests for this, I was super busy sorting some stuff for the last few weeks but will look at it asap.

ollieread avatar Sep 13 '22 08:09 ollieread

What happens if we clone a model object?

fgaroby avatar Sep 21 '22 14:09 fgaroby

We should probably look at the issue(s) Rails had with this, when considering adding it to Laravel.

If the issues are strictly "dev misunderstood how it works", maybe it can be solved with documentation?

If the issues are bugs etc., maybe they can be fixed? or maybe not, since they removed it.

Anyway, the feature sounds great and I can see the use for it (mainly for Users, Teams, Plans, Settings, ... as far as I can see)

LasseRafn avatar Sep 29 '22 05:09 LasseRafn

I'm going to go ahead and close this for now. Feel free to resend this once you have time, thanks.

driesvints avatar Sep 29 '22 10:09 driesvints