quarkus exists() method for repository

Description Nowadays we have the findByIdOptional and count to check if a record exists inside the database. count is much faster than selecting the whole object data and serializing it. But wouldn't it be nice to have an exists(id) method to abstract a db query that is so common?

Mar 16 '20 12:03 cunhazera

cc @FroMage @emmanuelbernard @loicmathieu

Mar 16 '20 15:03 gastaldi

Is it really that common to check for existence of a record without loading it? I've never had to do that myself.

Mar 17 '20 11:03 FroMage

@FroMage but isn't it too much to load all fields from db to check if it exists?

Mar 17 '20 11:03 cunhazera

Well, sure, but only if you need to check for existence without later using the entity. I've never seen such a use case. All my use-cases check if the object exist and then proceed to use the entity.

Mar 17 '20 13:03 FroMage

@FroMage yeah, it makes sense. In my case, I check the record exists. If it does, I throw a custom exception.

Mar 17 '20 13:03 cunhazera

@FroMage I also sometimes just check for existence in my apps, and then throw an NotFoundException.

If you want, I can try to implement this feature.

Mar 17 '20 13:03 Postremus

and then throw an NotFoundException

But if it exists, what does it do?

Mar 17 '20 14:03 FroMage

I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.

Mar 29 '20 19:03 alexander-dammeier

note that count is not too far from what you have in exists.

if ( !Person.exists(1L) ) { ... } // vs if ( Person.count("id", 1L) != 1 ) { ... }

On Sun, Mar 29, 2020 at 9:36 PM Alexander Dammeier [email protected] wrote:

I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/quarkusio/quarkus/issues/7884#issuecomment-605688058, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACJNWA7GUI7XQLGT4FREGDRJ6PLLANCNFSM4LMFG6LA .

Mar 30 '20 13:03 emmanuelbernard

But is exists faster? Are all databases intelligent enough not to go through all rows to count duplicates in a column with an unique constraint?

edit: but exists can also be used for columns without unque constraint so it should be faster than count in this situations.

Mar 30 '20 14:03 alexander-dammeier

What do you want for the feature? The existence of the entity? Or the existence of some column value? I thought the former based on the issue description. In this case, I don't think my count approach is any different than whatever a custom exist implementation would do performance wise. If anything it could be made more memory efficient than a select id from entity where id = ?. The count is done after the filtering.

The one annoyance with my solution is that the user must know that the id property is named "id"

Mar 30 '20 14:03 emmanuelbernard

:+1: to implement this, I think that of course you can do a count but this exists methods is more about clean code. At the end you can always do in Java list.size() == 0 but we do list.isEmpty() this is more or less the same.

Jul 07 '20 09:07 lordofthejars

count is not the same as exists.

Postgres example:

SELECT EXISTS(SELECT 1 FROM contact WHERE name='John') vs
SELECT COUNT(*) FROM contact WHERE name = 'John'

Now imagine that name has no index on it (on purpose), and request results in a full-table-scan. Count will have to actually scan the whole table, while EXISTS can stop as soon as it sees first "John" guy.

Dec 20 '24 09:12 abuinitski

Sure, but the question remains: what do you do once you know the entity exists?

Dec 20 '24 13:12 FroMage

Sure, but the question remains: what do you do once you know the entity exists?

The question is not what "to do", the question is what "not to do".

For example:

Do not create a duplicate.
Do not bother with custom native query and "ON CONFLICT" clause in case a duplicate already exists
Do not bother parsing native database exception code, and checking failing constraint name, to catch duplicate error and respond to client with a reasonable error message instead of an "unknown error something is wrong with my database idk"

Of course, race condition is still possible unless you raise transaction isolation level, but then it would be rare enough so I just can refer to low salary, unreasonable deadlines, and bad management in general to justify why my code has sporadic issues.

Dec 23 '24 02:12 abuinitski

My argument was that in most cases, once we know the entity exists, we typically load it.

For example, if you want to avoid creating a duplicate, you check that it exists, but then… what do you do with the object you wanted to create? Don't you turn the insert into an update? For that you'll need the entity.

I'm only trying to get a better idea of the usefulness of this. I can believe it's more efficient than count, for sure. That's one argument in its favour already.

I didn't understand yet how useful it can be to check for something's existence and if it exists, not need it. If it doesn't exist, there's no problem, of course.

Jan 08 '25 16:01 FroMage

In our case, we often check for the existence of an entry because the existence alone tells us if a step in our process was executed successful. Otherwise we schedule a retry, so for this case we use it more like an event store and the entries are often immutable.

I think it is very dangerous to use an exist function to avoid duplicates as another process could create such an entry after you just checked it. In my opinion, unique constraints (ID conflicts) are the only viable option to truly avoid this and therefore we always check if the transaction failed because of a conflict. We then do not load the entry because knowing that it exists is enough to continue or there is simply nothing left to do.

Jan 10 '25 11:01 alexander-dammeier

Our company has a microservice architecture which also uses Kafka for communication. Since Kafka leaves the fitering of messages to the clients, our app needs to decide if the received event is relevant for us. For this I want to check if a used identifier in a Kafka message exists in our db (which contains millions of records, only a small subset of the billions of possible entities of our whole system). Having to use a count instead is such a dealbreaker for us that I'll have to write a non-panache query instead.

May 30 '25 06:05 markus-lehr-dedalus

@gavinking is there an equivalent .exists(Object id) method in the latest ORM these days? I could not find anything in the Session API.

Jun 11 '25 15:06 FroMage

No, and what would the semantics of such an operation be?

Is it enough to check the second-level cache, or does it have to always go to the database?
Does it acquire some sort of exclusive lock on that id so that no other transaction can insert or delete the row until the present transaction commits?

Jun 11 '25 15:06 gavinking

But the main reason we don't have it is it's typically such a crappy and inefficient way to interact with the database.

Jun 11 '25 15:06 gavinking

Probably the people who want this should be using upsert() or something else which doesn't suck.

Jun 11 '25 15:06 gavinking

So, to be clear (I was posting from my phone). The people posting above are giving what looks to me like terrible advice on how to implement an upsert on databases which don't support the SQL merge statement.

The approach Hibernate uses to emulate merge on databases which don't have it, is the following:

attempt an UPDATE on the row, and check the returned rowcount (if it is 1, stop)
if it is 0, attempt an INSERT.

An alternative approach is to attempt the INSERT first, and have fancy code for checking if a SQLException represents a primary key violation.

The following is not a good solution:

attempt to SELECT the row
if it exists, attempt an UPDATE, or otherwise,
if not, attempt an INSERT and 🙏🙏that it wasn't inserted since the SELECT, since we don't have any sort of lock on the nonexistent row.

Anyway, people don't need to be implementing this stuff themselves, since it's already implemented by StatelessSession.upsert(), which will automatically use a MERGE on platforms which support it.

Jun 11 '25 15:06 gavinking

@alexander-dammeier

I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.

And this is terrible.

First of all: "aggregates" are not the right way to use JPA. You can read what we have to say about aggregates here: https://docs.jboss.org/hibernate/orm/7.0/introduction/html_single/Hibernate_Introduction.html#associations
Second of all: referential integrity is the responsibility of the database! You should have foreign key constraints, not rubbish select queries to validate referential integrity.

I'm on my knees begging everyone to put down the DDD Jonestown Kool-aid and just try using JPA and your relational database in the normal way that these things were intended to be used.

Jun 11 '25 16:06 gavinking

Again, in our distributed system (hospital), we receive a lot of Kafka messages - only a small part of which is relevant for our app. Whether or not a message is relevant is only decided if the identifiers match some others in our DB. If the identifiers don't match, we don't want to store an entity but ignore the message. This is the expected case and there we don't want to upsert or merge or persist, but need to efficiently and quickly check if this DB entry exists. The events which we actually need to store are rare so an extra round trip is not the end of the world.

@gavinking - I see your point and I also read that others just need an upsert, but I'd like to believe that we actually have a legitimate usecase for this exists check.

Jun 12 '25 06:06 markus-lehr-dedalus

Yeah, agreed. I was sceptical because I assumed exists would always be followed by a load. @gavinking assumes it's for upsert. But @markus-lehr-dedalus was the first to raise a use-case where exists is not followed by either load or update but for other actions (perhaps not even on the DB: on other systems?).

I am not sure if it's a common use-case, though. But at least it's new to me :)

Jun 12 '25 08:06 FroMage

@markus-lehr-dedalus In your use case, what are the transactional semantics of that?

Jun 12 '25 08:06 gavinking

Another comment: in the case where exists() takes an id, I'm not clear on why a count query (for example, getResultCount() in Hibernate) is no good. I would have expected the query planner to be able to come up with something very efficient when it's the PK.

(Unclear to me why exists around a subquery would be better.)

Jun 12 '25 09:06 gavinking

@gavinking I'm not 100% certain what you mean by transactional semantics, but I can give you a workflow:

One of the specific workflows (simplified a bit) for us is that we get the Kafka message for any medical case update (@Incoming annotation on method, not @Channel on mutiny stuff).

After a bit of validation, we enter a method marked as @Transactional for processing the medical case. This medical case has a number (the thing I called identifier, but not actually the DB id, not even unique constraint) and we check if our app has any entities (can be multiple as well, e.g. blood samples) which reference this medical case via this number (ideally with this exists method). Since we don't store whole the medical case itself (huge, complex object and another app's responsibility), this number does not have any foreign key constraint in our app. I therefore don't think any optimization in getResultCount can be performed.

If we find any entity, then we know that we have to process the message's content further and create or update another whole different entity (physical location relevant for nurses) than the one we executed the exists check for. After that, we send SSE updates to connected clients and then the transactional method ends.

If we don't find any entity (e.g. patient came in for something unrelated to blood samples), then we don't create any entity and the transactional method ends.

Jun 12 '25 09:06 markus-lehr-dedalus

And then you retry later?

Jun 12 '25 10:06 gavinking