exists() method for repository
Description
Nowadays we have the findByIdOptional and count to check if a record exists inside the database. count is much faster than selecting the whole object data and serializing it. But wouldn't it be nice to have an exists(id) method to abstract a db query that is so common?
cc @FroMage @emmanuelbernard @loicmathieu
Is it really that common to check for existence of a record without loading it? I've never had to do that myself.
@FroMage but isn't it too much to load all fields from db to check if it exists?
Well, sure, but only if you need to check for existence without later using the entity. I've never seen such a use case. All my use-cases check if the object exist and then proceed to use the entity.
@FroMage yeah, it makes sense. In my case, I check the record exists. If it does, I throw a custom exception.
@FroMage I also sometimes just check for existence in my apps, and then throw an NotFoundException.
If you want, I can try to implement this feature.
and then throw an NotFoundException
But if it exists, what does it do?
I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.
note that count is not too far from what you have in exists.
if ( !Person.exists(1L) ) { ... } // vs if ( Person.count("id", 1L) != 1 ) { ... }
On Sun, Mar 29, 2020 at 9:36 PM Alexander Dammeier [email protected] wrote:
I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/quarkusio/quarkus/issues/7884#issuecomment-605688058, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACJNWA7GUI7XQLGT4FREGDRJ6PLLANCNFSM4LMFG6LA .
But is exists faster? Are all databases intelligent enough not to go through all rows to count duplicates in a column with an unique constraint?
edit: but exists can also be used for columns without unque constraint so it should be faster than count in this situations.
What do you want for the feature? The existence of the entity? Or the existence of some column value?
I thought the former based on the issue description. In this case, I don't think my count approach is any different than whatever a custom exist implementation would do performance wise. If anything it could be made more memory efficient than a select id from entity where id = ?. The count is done after the filtering.
The one annoyance with my solution is that the user must know that the id property is named "id"
:+1: to implement this, I think that of course you can do a count but this exists methods is more about clean code. At the end you can always do in Java list.size() == 0 but we do list.isEmpty() this is more or less the same.
count is not the same as exists.
Postgres example:
SELECT EXISTS(SELECT 1 FROM contact WHERE name='John')vsSELECT COUNT(*) FROM contact WHERE name = 'John'
Now imagine that name has no index on it (on purpose), and request results in a full-table-scan. Count will have to actually scan the whole table, while EXISTS can stop as soon as it sees first "John" guy.
Sure, but the question remains: what do you do once you know the entity exists?
Sure, but the question remains: what do you do once you know the entity exists?
The question is not what "to do", the question is what "not to do".
For example:
- Do not create a duplicate.
- Do not bother with custom native query and "ON CONFLICT" clause in case a duplicate already exists
- Do not bother parsing native database exception code, and checking failing constraint name, to catch duplicate error and respond to client with a reasonable error message instead of an "unknown error something is wrong with my database idk"
Of course, race condition is still possible unless you raise transaction isolation level, but then it would be rare enough so I just can refer to low salary, unreasonable deadlines, and bad management in general to justify why my code has sporadic issues.
My argument was that in most cases, once we know the entity exists, we typically load it.
For example, if you want to avoid creating a duplicate, you check that it exists, but then… what do you do with the object you wanted to create? Don't you turn the insert into an update? For that you'll need the entity.
I'm only trying to get a better idea of the usefulness of this. I can believe it's more efficient than count, for sure. That's one argument in its favour already.
I didn't understand yet how useful it can be to check for something's existence and if it exists, not need it. If it doesn't exist, there's no problem, of course.
In our case, we often check for the existence of an entry because the existence alone tells us if a step in our process was executed successful. Otherwise we schedule a retry, so for this case we use it more like an event store and the entries are often immutable.
I think it is very dangerous to use an exist function to avoid duplicates as another process could create such an entry after you just checked it. In my opinion, unique constraints (ID conflicts) are the only viable option to truly avoid this and therefore we always check if the transaction failed because of a conflict. We then do not load the entry because knowing that it exists is enough to continue or there is simply nothing left to do.
Our company has a microservice architecture which also uses Kafka for communication.
Since Kafka leaves the fitering of messages to the clients, our app needs to decide if the received event is relevant for us.
For this I want to check if a used identifier in a Kafka message exists in our db (which contains millions of records, only a small subset of the billions of possible entities of our whole system).
Having to use a count instead is such a dealbreaker for us that I'll have to write a non-panache query instead.
@gavinking is there an equivalent .exists(Object id) method in the latest ORM these days? I could not find anything in the Session API.
No, and what would the semantics of such an operation be?
- Is it enough to check the second-level cache, or does it have to always go to the database?
- Does it acquire some sort of exclusive lock on that id so that no other transaction can insert or delete the row until the present transaction commits?
But the main reason we don't have it is it's typically such a crappy and inefficient way to interact with the database.
Probably the people who want this should be using upsert() or something else which doesn't suck.
So, to be clear (I was posting from my phone). The people posting above are giving what looks to me like terrible advice on how to implement an upsert on databases which don't support the SQL merge statement.
The approach Hibernate uses to emulate merge on databases which don't have it, is the following:
- attempt an
UPDATEon the row, and check the returned rowcount (if it is 1, stop) - if it is 0, attempt an
INSERT.
An alternative approach is to attempt the INSERT first, and have fancy code for checking if a SQLException represents a primary key violation.
The following is not a good solution:
- attempt to
SELECTthe row - if it exists, attempt an
UPDATE, or otherwise, - if not, attempt an
INSERTand 🙏🙏that it wasn't inserted since theSELECT, since we don't have any sort of lock on the nonexistent row.
Anyway, people don't need to be implementing this stuff themselves, since it's already implemented by StatelessSession.upsert(), which will automatically use a MERGE on platforms which support it.
@alexander-dammeier
I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.
And this is terrible.
- First of all: "aggregates" are not the right way to use JPA. You can read what we have to say about aggregates here: https://docs.jboss.org/hibernate/orm/7.0/introduction/html_single/Hibernate_Introduction.html#associations
- Second of all: referential integrity is the responsibility of the database! You should have foreign key constraints, not rubbish
selectqueries to validate referential integrity.
I'm on my knees begging everyone to put down the DDD Jonestown Kool-aid and just try using JPA and your relational database in the normal way that these things were intended to be used.
Again, in our distributed system (hospital), we receive a lot of Kafka messages - only a small part of which is relevant for our app. Whether or not a message is relevant is only decided if the identifiers match some others in our DB. If the identifiers don't match, we don't want to store an entity but ignore the message. This is the expected case and there we don't want to upsert or merge or persist, but need to efficiently and quickly check if this DB entry exists. The events which we actually need to store are rare so an extra round trip is not the end of the world.
@gavinking - I see your point and I also read that others just need an upsert, but I'd like to believe that we actually have a legitimate usecase for this exists check.
Yeah, agreed. I was sceptical because I assumed exists would always be followed by a load. @gavinking assumes it's for upsert. But @markus-lehr-dedalus was the first to raise a use-case where exists is not followed by either load or update but for other actions (perhaps not even on the DB: on other systems?).
I am not sure if it's a common use-case, though. But at least it's new to me :)
@markus-lehr-dedalus In your use case, what are the transactional semantics of that?
Another comment: in the case where exists() takes an id, I'm not clear on why a count query (for example, getResultCount() in Hibernate) is no good. I would have expected the query planner to be able to come up with something very efficient when it's the PK.
(Unclear to me why exists around a subquery would be better.)
@gavinking I'm not 100% certain what you mean by transactional semantics, but I can give you a workflow:
One of the specific workflows (simplified a bit) for us is that we get the Kafka message for any medical case update (@Incoming annotation on method, not @Channel on mutiny stuff).
After a bit of validation, we enter a method marked as @Transactional for processing the medical case.
This medical case has a number (the thing I called identifier, but not actually the DB id, not even unique constraint) and we check if our app has any entities (can be multiple as well, e.g. blood samples) which reference this medical case via this number (ideally with this exists method).
Since we don't store whole the medical case itself (huge, complex object and another app's responsibility), this number does not have any foreign key constraint in our app. I therefore don't think any optimization in getResultCount can be performed.
If we find any entity, then we know that we have to process the message's content further and create or update another whole different entity (physical location relevant for nurses) than the one we executed the exists check for.
After that, we send SSE updates to connected clients and then the transactional method ends.
If we don't find any entity (e.g. patient came in for something unrelated to blood samples), then we don't create any entity and the transactional method ends.
And then you retry later?