persistence icon indicating copy to clipboard operation
persistence copied to clipboard

Tighten the spec on EntityManager.getReference(…) in error scenarios

Open odrotbohm opened this issue 6 years ago • 10 comments

The method currently leaves it to the implementation whether to immediately throw an EntityNotFoundException or delay that until the first access on the returned entity (proxy). This means that client code cannot be written in a portable way.

What plays into this is that the two @throws clauses documented have slightly different semantics. As far as I can see the IAE is mandatory to be thrown, the ENFE is just documented as "if the entity state cannot be access" (what does that even mean?). This creates the impression that the second throws clause is mandatory as well, which it in fact isn't.

I suggest to remove the ability of implementations to delay the exception until first access. I know that implementors being allowed to do that allows them to completely avoid any kind of database interaction. But this comes at the cost of client code then unexpectedly facing the exception in code that's completely not prepared to see it, which is basically a broken abstraction. Removing the ability would require the implementors to at least trigger an exists-by-id lookup but that still can be significantly cheaper than reading an entire object. Thus that will leave developers with significant enough difference to find(…) but remove those weak guarantees you get from using the method.

For reference, this came up in a ticket in Spring Data JPA.

odrotbohm avatar Jan 29 '19 12:01 odrotbohm

IMO this is a very bad idea. This will cause lots of queries to be issued that nobody expected before. Existing applications will suffer a lot if this is done. There must be a way to get a managed proxy reference, after all, you can get proxies when fetching entities as well which might also fail at a later point if the actual object doesn't exist. Do we disallow lazy loading because that could happen? I hope not, but that's always gonna be a problem when lazyness is allowed.

beikov avatar Jan 29 '19 12:01 beikov

As it stands today, you cannot call the method and implement correct code using without knowing how the underlying implementor handles that case. I.e. it's the same as falling back to provider specific API. Except it's worse, as it looks like it's portable and is not. That's not a problem you can discuss away. And it leads to the method being quite useless if you want to get to portable code. The whole point of a spec is to enable that.

I am fine with the method stay as is (read: mostly unused) and an additional one being introduced with more consistent guarantees of when what particular exception is supposed to be expected, that allows retrieving a proxy of an object being in place but failing if it doesn't exist in the first place.

Correct me if I'm wrong, but with standard isolation levels, foreign key constraints and a read-only transaction in place, how should a related object all of a sudden disappear or not be lazily-resolvable? Except, database connection losses of course. Those strong consistency guarantees are usually an aspect I get sold by relational database proponents over their NoSQL counterparts.

odrotbohm avatar Jan 29 '19 12:01 odrotbohm

If this method were to always actually query the database, why have it? Why not just use find() then? There will always be situations where it could happen that an object is not available anymore during lazy loading. When using the read committed isolation level, it could happen that the object is deleted during a running transaction. Or maybe you don't have a FK constraint and have invalid data yet you still want to be able to process it.

I agree that the behavior of the method should be clarified, but IMO there must still be the possibility to get a reference to an entity without actually loading it from the DB.

beikov avatar Jan 29 '19 13:01 beikov

There will always be situations where it could happen that an object is not available anymore during lazy loading.

True. But what percentage of the applications are prepared to actually handle this failure in a graceful way? All the applications I have ever seen just barf at any LazyLoadingException.

What is the use of delaying that exception?

schauder avatar Jan 29 '19 13:01 schauder

The applications I developed and saw so far made use of getReference when appropriate. It's a perfect match especially when you have FK constraints actually, because when you set a reference to an association, you will get a constraint violation exception at commit time when using that mechanism.

What is the use of delaying that exception?

Performance obviously. Imagine you set a lot of associations through that mechanism. There is no need to always load the objects just because you want to refer to them.

IMO the issue with lazy loading that people are usually having is a different one that can be solved elegantly with DTOs, but this is probably not the right place to discuss this.

beikov avatar Jan 29 '19 13:01 beikov

If this method were to always actually query the database, why have it? Why not just use find() then?

Because to find out whether the instance exists a count(id) might be way faster as it can hit an index and you don't (immediately) need to read all data, create objects from it etc. Counter question: if it's supposed to never hit the database immediately, why then define that the exception can / should be thrown immediately?

On a more general note I think that that kind of API leads to pretty bad code in the first place, as it's basically creating the impression you could look up stuff but then later on, a few layers up, could/should catch EntityNotFoundException which violates quite a few design principles from the very start. If you want to delay the lookup, make sure that's explicit in the type system by using CompletableFuture, some sort of Lazy or the like. Persistence technology specific exceptions outside a data access layer are a broken abstraction.

There will always be situations where it could happen that an object is not available anymore during lazy loading. When using the read committed isolation level, it could happen that the object is deleted during a running transaction.

If that happens within the same transaction / persistence context, then the JPA provider could mitigate that and not result in an exception, couldn't it? Outside that transaction, you wouldn't even see the delete.

I agree that the behavior of the method should be clarified, but IMO there must still be the possibility to get a reference to an entity without actually loading it from the DB.

That's still in place after what I suggested. In general, I'm concerned about removing the ambiguity from the API as it significantly reduces the value of the method in the first place. I'm totally open on how to get this done.

odrotbohm avatar Jan 29 '19 13:01 odrotbohm

The applications I developed and saw so far made use of getReference when appropriate.

Because you know which way the underlying implementation behaves. That's not a luxury we have with Spring Data.

Performance obviously. Imagine you set a lot of associations through that mechanism. There is no need to always load the objects just because you want to refer to them.

You keep repeating that but it was never suggested, that the call should resolve an entity. I suggested it could use the most minimal means possible to detect that the reference might be invalid immediately. Wouldn't what you say suggest, that you could've obtained an invalid reference (i.e. one pointing to a non-existing object), executed a lot of business code just to eventually find out that all of this was invalid as the reference fails to persist? Wouldn't it be better to already know when you initially create the reference (fail fast principle, effectively)?

odrotbohm avatar Jan 29 '19 13:01 odrotbohm

Because to find out whether the instance exists a count(id) might be way faster as it can hit an index and you don't (immediately) need to read all data, create objects from it etc. Counter question: if it's supposed to never hit the database immediately, why then define that the exception can / should be thrown immediately?

I agree that there are more efficient methods to check whether some object exists, but the spec says:

The EntityNotFoundException is thrown by the persistence provider when an entity reference obtained by getReference is accessed but the entity does not exist.

and the JavaDocs for getReference say

The persistence provider runtime is permitted to throw the EntityNotFoundException when getReference is called.

so it was done this way deliberately.

After all, it's up to the developer to decide what is needed. If you define Spring Data's getOne method to delegate to EntityManager.getReference, developers have to figure out for themselves if that is appropriate or not. I get your point that you want it to behave the same way across implementations, I do want that too, I just don't think changing getReference is the way to go. I'd rather see a new method and deprecate getReference. How about <T> T getReference(Class<T> clz, Object id, LoadStyle s) where LoadStyle is an enum that defines behaviors for

  • Not loading at all
  • Just checking for existence

On a more general note I think that that kind of API leads to pretty bad code in the first place, as it's basically creating the impression you could look up stuff but then later on, a few layers up, could/should catch EntityNotFoundException which violates quite a few design principles from the very start. If you want to delay the lookup, make sure that's explicit in the type system by using CompletableFuture, some sort of Lazy or the like. Persistence technology specific exceptions outside a data access layer are a broken abstraction.

I do agree that with this as well and dislike how this is done in a few applications. IMO it's just wrong to expose access to getters when data is not available. DTOs are a way to solve that, but the approach is rarely taken which leads us to all the lazy loading problems people face.

If that happens within the same transaction / persistence context, then the JPA provider could mitigate that and not result in an exception, couldn't it?

Sure, if the delete happens in the same TX right before calling getReference I would assume that JPA providers could throw an exception immediately, but then there could be triggers that do some stuff, so I wouldn't assume anything here.

Outside that transaction, you wouldn't even see the delete.

When TX1 starts before TX2, and TX1 reads something after TX2 comitted a delete for that, TX1 might not see it anymore when using the read committed isolation level.

Because you know which way the underlying implementation behaves. That's not a luxury we have with Spring Data.

As far as I know, only EclipseLink actually queries the entity. You could abstract over that if you want that. Don't get me wrong, I understand that you want methods with clear semantics, I'm just noting that there are ways you can make this work.

You keep repeating that but it was never suggested, that the call should resolve an entity. I suggested it could use the most minimal means possible to detect that the reference might be invalid immediately. Wouldn't what you say suggest, that you could've obtained an invalid reference (i.e. one pointing to a non-existing object)

I agree, it would be convenient if there was an easy way to simply check for existence of an entity and yes, what I am suggesting is that getReference should allow to possibly get access to an invalid/non-existent reference.

executed a lot of business code just to eventually find out that all of this was invalid as the reference fails to persist? Wouldn't it be better to already know when you initially create the reference (fail fast principle, effectively)?

I actually think that this is an advantage. It's just like optimistic locking might fail at a later point, but the benefit you get is that you need to do less work if the successful case is the more common scenario. When your use case is heavily contended, a different approach might be more appropriate, but that's up to the developer.

beikov avatar Jan 29 '19 14:01 beikov

I'm with @beikov in this case. By removing the ability to have getReference act lazily we could also just use em.find. Not having to go to the DB for every em.getReference is a big benefit in many cases.

Otoh, the case where em.getReference blows up with a ENFE is pathologic anyway in my eyes. This is NOT a standard use case imo. em.find/em.getReference uses the primary key. If you know a row in the db might not exist, then you simply must not use em.getReference but em.find. As easy as that.

struberg avatar Jan 29 '19 14:01 struberg

gerReference was designed to get an object "shell" with no access to the database. It lets implementors to not implement this very useful optimisation and hence why you get some implementations throwing exceptions right away.

If we clarify anything, let's not change the semantic of lazy evaluation. This is a must have to be able to attach objects to others without having to load them. If we change that, then there is no point in having getReference() at all.

emmanuelbernard avatar Feb 01 '19 12:02 emmanuelbernard

I suggest to remove the ability of implementations to delay the exception until first access.

I agree with others that this is a very bad idea.

And just as others have already pointed out, the whole purpose of having getReference() is to allow the entity to be fetched lazily! If the user doesn't want to allow that, the use can just call find().

We should close this issue.

gavinking avatar Aug 08 '23 18:08 gavinking