Improve performance of `JdbcStepExecutionDao::getLastStepExecution`
- Use SQL order by clause instead of Java Comparator
- Limit result set size to 1
Alternatively, we could use SE.STEP_EXECUTION_ID IN (SELECT MAX(STEP_EXECUTION_ID) FROM ...) to align with other queries, but there is a risk to break existing behavior since CREATE_TIME is not considered for comparing.
https://github.com/spring-projects/spring-batch/compare/main...quaff:spring-batch:patch-16?expand=1
The suggested change is basically a revert of the fix for #4657.
I'm not sure whether the performance gain is worth it, compared to the risk of re-introducing the previous issue.
The suggested change is basically a revert of the fix for #4657.
I'm not sure whether the performance gain is worth it, compared to the risk of re-introducing the previous issue.
This commit will not query all matched rows but only first row.
statement.setMaxRows(1); // limit max rows to 1
if (rs.next()) {
// only first row is used.
}
Our DBA has not been able to find a solution to improve the performance with an additional index. What we did find out is that, by removing the ORDER BY, we get the result (typically just a single row in the normal case) in a matter of milliseconds. But with the ORDER BY, it takes ~ 60 seconds.
It's weird, but I'm OK if this PR is rejected.
I am not a big fan of sorting things on the application side if we can do it on the database side, so this PR has its added value if we do not include any regression or performance degradation as reported in #4657. So basically we moved this from Java to the DB in #891 with 62a8f44, then from the DB to Java in #4657 with 5a62de90, and now back again to the DB with this PR.
I could not find the reason we merged #4657, but I guess it was due to this https://github.com/spring-projects/spring-batch/issues/4657#issuecomment-2334790044. I suggested to use LIMIT 1 or equivalent, but in fact the syntax could differ between DB providers which is more complex. Probably since we were targeting that fix for a patch release from what I see 5.1.3, we accepted to move the logic back from the DB to Java. But it's probably time to introduce a more sophisticated / proper approach to do it on the DB side.
This commit will not query all matched rows but only first row.
Is using statement.setMaxRows(1); // limit max rows to 1 somehow equivalent to something like LIMIT 1? I mean if we can avoid having to provide a different limiting query depending on the DB (TOP 1 or FETCH FIRST 1 ROW ONLY or WHERE ROWNUM <= 1), that would be better.
Thoughts?
I'm not sure whether the performance gain is worth it, compared to the risk of re-introducing the previous issue.
@hpoettker I agree. I don't know @quaff if you had a chance to benchmark this change? It would be great if we have a baseline to measure the performance improvement/degradation and decide if this change is worth it.
Is using
statement.setMaxRows(1); // limit max rows to 1somehow equivalent to something likeLIMIT 1?
I think most JDBC drivers honer the max rows.
I don't know @quaff if you had a chance to benchmark this change? It would be great if we have a baseline to measure the performance improvement/degradation and decide if this change is worth it.
I didn't, @jpraet could you verify this patch works for you since you reported #4657?
Hi, we have since upgraded from DB2 10 to DB2 11 and I can no longer reproduce the slow query reported in #4657. So, no objections from my side to go ahead with this PR and move sorting back to the database.
@jpraet Thank you for your feedback! Just in time to include this in 6.0.0-RC2.
Rebased and merged as 1cad0397c6286a287f2b33e03a0d00e89a64c056. Thank you @quaff for your contribution 👍