quarkus
quarkus copied to clipboard
Add Support for `merge(Entity)` in `PanacheRepositoryBase`
Description
In a project, I had to create a few entities in one go, and persist() or persistAndFlush() were not good enough
i got "Detached entity passed to persist" error message
@Entity
@Table(name = "Table_A")
public class EntityA {
@Id
@GeneratedValue(generator="uuid-generator")
private UUID uid;
@OneToMany(mappedBy = "parent", cascade=CascadeType.ALL, fetch=FetchType.EAGER)
private Set<EntityB> setOfB;
}
@Entity
@Table(name = "Table_B")
public class EntityB {
@Id
@GeneratedValue(generator="uuid-generator")
private UUID uid;
@ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.ALL, optional = false)
@JoinColumn(name = "parent_uid", referencedColumnName = "uid", nullable=false)
private EntityA parent;
}
@ApplicationScoped
public class EntityARepository implements PanacheRepository<EntityA> {
}
I had to use the merge of the entity manager. As of today it is not available in a PanacheRepository
So I had to add in my repository:
public void merge(EntityA a) {
getEntityManager().merge(a);
}
Implementation ideas
I know it is only 2 lines of code but I would propose you to add to PanacheRepositoryBase
default void merge(Entity entity) {
JpaOperations.INSTANCE.merge(entity);
}
/cc @FroMage (panache), @loicmathieu (panache)
IIRC we did not include merge by design, to avoid people using it too much and running into issues. Right, @Sanne ?
I thought it would be better to write it here rather than opening a new issue.
If we initially wrote some data to the database with a load script, the duplicate key value violates problem occurs with Panache because the Id is not defined explicitly.
If we define the Id explicitly, we get the exception detached entity passed to persist. I think this exception message in particular is quite confusing.
Can't the default Id provider mechanism obtain the next primary key value of the table before persisting the entity?
IIRC we did not include
mergeby design, to avoid people using it too much and running into issues. Right, @Sanne ?
Not sure how people would abuse merge, but since the RepositoryBase only expose persist(), how do we re-attach a detached entity? For example, what is the pattern people use for below scenario?
@Transactional
public Person persistEntity() {
Person person = new Person("Bob");
personRepository.persist(person); // Entity is managed within this transaction
person.setName("John"); // This new name will be saved to DB, because we are still within a TX
return person;
}
public void demonstrateDetachedEntity() {
Person person = persistEntity();
// At this point, the transaction is complete, and entity detached. Name is John
person.setName("Steve"); // Lets update name to Steve
// Question: how to get person updates persisted since there is no merge api?
}
I don't think we promote things like that, with entities leaking out of transactions.
In particular, we advise people to put @Transactional on REST endpoints, or equivalent other entry points.
In particular, we advise people to put
@Transactionalon REST endpoints, or equivalent other entry points.
that is understandable, but opening tx at a coarse grained level doesn't scale well in this microservices world. For example I often find myself having to call other Rest endpoints eg, generate a report or call a slow responding LLM endpoint in the cloud ;-) Having an open tx on those calls eventually leads to a depleted connection pool.
My flow is generally, fetch data, do expensive operations, call a method annotated with @Transactional eg.
@Transactional public Entity save(Entity entity) { return em().merge(entity); }
Spring Data v1 also went the coarse grained route but changed to fine grained tx in v2 because of the aforementioned issues.
They have a fairly sensible implementation now: https://github.com/spring-projects/spring-data-jpa/blob/865fd00edadd627d3a43f7d3d35feed48e2cba77/spring-data-jpa/src/main/java/org/springframework/data/jpa/repository/support/SimpleJpaRepository.java#L628C2-L631C1
Note: all data changing methods are annotated with @transactional, so for newbies it works out of the box. And as applications scale one can manage the transactions in upper layers and the repository' save method will join the current tx.
It's something to consider because devs will likely have to implement a similar save method in their projects at some point.
merge has so many issues with its definition on how exactly things are merged, and tends to lure people into using entities as DTOs (because it's so easy to merge them, but it's almost always wrong) that we're skeptical about its use.
I haven't heard of many people doing entity fetching on detached entities, then expensive work, then a TX just to apply the changes. I'm not sure this is a common way to use Hibernate, but I'll defer to @Sanne or @yrodiere.
Personally I'm not a fan of such patterns, but I have to admit that what @sabob reports is accurate: it's a popular approach and sometimes useful. My concern is that there are various risks with such patterns which are not widely understood, and hard to test as they involve safety under concurrent transactions; this also implies people tend to eventually ship problematic code which is not identified by integration testing.
Up to you @FroMage .. while personally I don't love it, I've learned also that my personal opinion is not a worthwhile crusade :)
There's an aspect of API design here. I think we initially tried hard to keep the Panache APIs "clean and intuitive"; the merge operation might indeed be useful in certain patterns, but not all of them. So it's a matter of deciding if we want to expose things which are popular in certain circles only, or provide stronger guidance and guardrails.
On the other hand, I still feel like writing getEntityManager().merge(a); is not an impossible thing to ask of "advanced users". I'm tempted to leave this as is…
@FroMage I agree with that, but perhaps the problem is people "stumble" on this, losing the flow? Especially if it's unclear that simply doing getEntityManager().merge(a) is an option.
What if we created an FAQ for Panache? Beyond helping people to figure this out, it also gives you an opportunity to highlight some concerns.
Note, that the approach I use is not because I think it's a good idea, but rather necessity. I don't know of an alterative :)
Starting transactions on boundaries works for a lot of use cases, but if the db pool runs out, one might end up with a similar, fetch / merge pattern.
Spring JPA hides their persistence complexity behind a save() method so it will insert or update depending on the situation.
PS: Hibernate has an implementation of the new Jakarta Data spec where Hibernate generates source code for the various annotations. Jakarta Data has a @Save annotation which attempts to hide the complexity around persistence as well: https://jakarta.ee/specifications/data/1.0/apidocs/jakarta.data/jakarta/data/repository/save
I just did a test and Hibernates generated the following code for @Save
public Book save(@Nonnul Book book) {
if (session.getIdentifier(book) == null) {
session.insert(book);
} else {
session.upsert(book);
}
return book;
}
No clue what "upsert" is but I guess it's cool ;-)
Update: Hibernate uses StatelessSession for Jakarta Data which contains this "upsert" method.
But as they saying goes, "when in doubt leave it out". Can always add it one day if required.
Jakarta Data is using StatelessSession as you mention, so there are no attached entities, and as such, merge is never needed or relevant.
We could definitely add a section about merge in the Hibernate or Panache docs, like we could add a section about why people should never use entities as DTOs.
We could definitely add a section about
mergein the Hibernate or Panache docs, like we could add a section about why people should never use entities as DTOs.
Would be great if the docs can provide a best-practice on how to solve the below scenario:
- Fetch the necessary data (e.g., a Customer) and set status to 'in_progress'.
- Fetch Order information from another microservice.
- Generate a PDF report and email it to the customer.
- Update the Customer and set status to 'complete'.
Steps 1 and 4 must be in transactions. Steps 2 and 3 could be in the same transaction, but doing so could result in a long-lived transaction.
If we try and stick to short-lived transactions, we must commit after step 1, leaving us with detached entities (DTOs).
there isn't much value If the docs says: don't detach.
The above scenario is common, hence Hibernate provides saveOrUpdate. Spring and Jakarta provides save and JPA merge.
Assuming a blocking endpoint, but this might be better done using a reactive endpoint, if you can use Hibernate Reactive:
@Inject
OrderMicroService orders;
@Inject
Mailer mailer;
@POST
public void doStuff(Long customerId){
markInProgress(customerId); // first transaction acting as lock
Order order = orders.findLatestOrder(customerId); // micro service
PDF pdf = renderPdf(order, customerId); // probably done in a transaction, if it requires loading DB stuff
mailer.send(Mail.withText("[email protected]", "An email from quarkus with attachment",
"This is my body")
.addAttachment("my-file-1.pdf",
pdf.getBytes(), "application/pdf")
);
markComplete(customerId);
}
@Transactional
public void markInProgress(Long customerId){
Customer c = Customer.findById(customerId);
if(c == null)
throw new NotFoundException();
if(c.status == Status.InProgress)
throw new AlreadyInProgressException();
c.status = Status.InProgress;
}
@Transactional
public void markComplete(Long customerId){
Customer c = Customer.findById(customerId);
if(c == null)
throw new NotFoundException();
if(c.status != Status.InProgress)
throw new SomethingIsWrongException();
c.status = Status.Complete;
}
Intuitively, this is how I'd do it, but perhaps there's more to the problem that I'm skipping over?
Assuming a blocking endpoint, but this might be better done using a reactive endpoint, if you can use Hibernate Reactive:
I like your approach :)
Regarding it being intuitive, I think it might depend on the audience because a Cookbook on JPA will have a section on 1st level cache and recommend using merge instead, especially if a Customer has many-to-many relationships that are eagerly fetched.
I uhm'd and ah'd yesterday regarding the StatelessSession, and since that is what Jakarta Data 1.0 supports, I like your approach even more.
It would be great if the docs provides a scenario like above with a best-practice solution. Then gauge the feedback.
If a dev feels more comfortable with the Spring JPA type of approach it should be relatively easy to create a custom Repo that extends PanacheEntityBase, with a save type method...
Appreciate your help this.
Well, that's a relief that what I wrote makes sense, because I keep wondering if I'm missing the point about merge :)
One more thing I'd mention against using merge: I don't know how it behaves in the face of concurrent modifications.
In the example I pasted, at steps 1 and 4 we load the Customer and immediatly change one single column.
If some other concurrent request modifies a separate column of the same Customer between steps 1 and 4, we should be fine: we'll load the modified value at 4, and only update our single column.
Now let's imagine what happens with merge, and frankly I'm just not sure of its semantics.
- Load the
Customer, change one column, commit, detach - Some other request changes a second column of the same
Customer - We
mergeourCustomer. At this point I've no idea if the second column remains the same as it was in 1 or if it becomes the modified value of 2. - We modify our column, and commit, but I've no idea if we just send an outdated second column to the DB that overrides the change in 2.
merge is full of subtle questions such as these, that are hard to figure out. Perhaps you know the answer? Perhaps your answer is incorrect? I honestly have no idea, I'd have to check the spec, but last time I checked it was not clear to me what happens in this case and many others. That's why we advise to avoid it.
One more thing I'd mention against using
merge: I don't know how it behaves in the face of concurrent modifications.
I've not read the spec either, but are used to JPA and how merge behaves with the Hibernate implementation.
Also, JPA has been around for years, so many developers are familiar with merge(), which is likely why this ticket was created.
In essence, since the entity is detached, merge() will create a new attached Customer, populate it from the db, and copy the values from the detached onto the attached one, and return the new attached entity. So the detached entity overwrites the values of the attached version. If a @version property is defined, merge() might throw an OptimisticLockException if the db was updated while the entity was detached.
Your questions from 1-4 are valid, and those who are familiar with merge() will understand what happens in the scenario you described.
For me, this was a valuable discussion to understand why there is no merge() or save() method in the Repo class, and how Panache best-practice address the scenario where a coarse grained transaction might not be practical.
And even if Panache discourages merge() it doesn't stop one from implementing it.
For completeness, and for anyone that comes across this thread, below is how a JPA Cookbook might implement the scenario. With Panache even :)
The key difference is that Instead of passing customerId, the detached entity is passed to methods.
@Inject
MyPanacheRepo repo;
@POST
@RunOnVirtualThread // Still on Java17 and new to Quarkus, so I hope this is supported
public void doStuff(Long customerId) {
Customer customer = repo.getCustomer(customerId); // customer is detached.
customer.setStatus(Status.InProgress);
// Note: A JPA cookbook would likely recommend passing the customer instance, not just the customerId.
Order order = orders.findLatestOrder(customer);
// Again, pass in the customer instance, not just the ID.
PDF pdf = renderPdf(order, customer); // This may be done in a transaction if DB loading is involved.
mailer.mailCustomer(customer); // Send email to the customer.
customer.setStatus(Status.Complete);
customer = save(customer); // Save the customer (re-attaching it).
// customer is detached again.
}
@Transactional
public Customer save(Customer customer) {
customer = repo.merge(customer);
return customer;
}
i have run into this recently from a different point of view in that it makes QuarkusComponentTest unit testing more cumbersome.
Panache repository has persist(entity) but not merge(entity). I have to do the following...
applicationPreferences = preferencesRepository.getEntityManager().merge(applicationPreferences);
Which makes mocking and unit testing now I have to Mock EntityManager as well like this...
@InjectMock
EntityManager entityManager;
when(preferencesRepository.getEntityManager()).thenReturn(entityManager);
when(preferencesRepository.getEntityManager().merge(any(ApplicationPreferences.class))).thenAnswer(invocation -> invocation.getArgument(0));
instead of just
when(preferencesRepository.merge(any(ApplicationPreferences.class))).thenAnswer(invocation -> invocation.getArgument(0));
Yes, that is annoying, but again, since we discourage people using merge, that goes with it, unfortunately.
@FroMage i get is discouraged but its perfectly valid in JPA and sometimes absolutely necessary for certain business cases to perform a merge. Shouldn't we leave this up to the developer to mis-use it?
We don't prevent it. We just make sure people have to go out of their way to shoot themselves in the foot :)