spring-data-relational icon indicating copy to clipboard operation
spring-data-relational copied to clipboard

Spring Data JDBC performance by caching SQL statements generated for derived queries.

Open sergey-morenets opened this issue 4 years ago • 12 comments

Hi

We have a Spring Data JPA (Hibernate) project that we would like to migrate to Spring Data JDBC. The main reasons are simplified configuration and model mapping. And we thought that it would lead to better performance.

However we did some benchmarks (with default settings) using JMH (H2 database) and it turned out that in most cases the performance has decreased. For example, we have the following repository and query method:

public interface ProductRepository 
	extends CrudRepository<Product, Integer> {
	
	Product findByName(String name);

And the benchmarks showed the following execution time (in ns): Spring Data JDBC - 80899 Spring Data JPA - 14124

So is it predictable? Or we missed something in our configuration/tests?

sergey-morenets avatar Dec 02 '21 08:12 sergey-morenets

Please provide a Minimimal Reproducable Example, preferable as a Github repository. Make sure to include the database, either as an in memory database or if that is not possible using Testcontainers.

Currently there are various scenarios where I expect one or the other to be faster.

schauder avatar Dec 02 '21 09:12 schauder

Hi @schauder

This is repository with simplified example: https://github.com/sergey-morenets/spring-data-benchmarks

You can open this project in IDE and run of the classes: SpringDataJdbcBenchmarking or SpringDataJpaBenchmarking to execute the benchmarks.

This is simplified project so the absolute results are different but Spring Data JPA is still faster (7016 ns) against Spring Data JDBC (39617 ns).

sergey-morenets avatar Dec 02 '21 09:12 sergey-morenets

If I interpret that correctly you don't measure actual loads with JPA but only the lookup in the first level cache, which is probably not what you want.

Invoke EntityManager.clear() between benchmarks.

schauder avatar Dec 02 '21 10:12 schauder

Hi @schauder

Thank you for the comment. Now I clear the entityManager 1st level cache at the beginning of the benchmark (I updated the repository):

@Benchmark
public Product springDataJpaQuery() {
        entityManager.clear();
        return productRepository.findByName("phone");
}

However it almost hasn't impacted the performance (execution time is 7479 ns)

sergey-morenets avatar Dec 02 '21 11:12 sergey-morenets

are there any benchmarks between spring data jdbc and mybatis?

GeorgeSalu avatar Dec 02 '21 18:12 GeorgeSalu

Thanks for the reproducer.

The main difference between the two benchmarks is, that you accidentally kicked out the Hikari connection pool by explicitly constructing the datasource. you can completely remove any Spring Data JDBC configuration and you'll see a significant performance boost.

It seems, we also do not properly cache the results of constructing the SQL statement from the method name, resulting in some significant overhead. You can workaround that by providing an explicit query.
The missing caching is something we'll fix.

schauder avatar Dec 03 '21 07:12 schauder

Results of modified benchmark.

Benchmark Mode Cnt Score Error Units SpringDataJdbcBenchmarking.springDataJdbcQuery avgt 5 10103.975 ± 1140.990 ns/op SpringDataJpaBenchmarking.springDataJpaQuery avgt 5 8622.891 ± 1886.295 ns/op

You can find the modified benchmark here: https://github.com/schauder/spring-data-benchmarks

schauder avatar Dec 03 '21 08:12 schauder

In general I would not expect better better performance from Spring Data JDBC compared to JPA implementations in the typical benchmark scenario.

The benefit of Spring Data JDBC is that it is much easier to understand what it is actually doing and therefor easier to use correctly.

This might very well result in better performance of real world applications due to few mistakes made.

schauder avatar Dec 03 '21 08:12 schauder

Hi @schauder

I returned to the benchmarks topic and used your project. However I noticed you'd added a @Query annotation for query method (https://github.com/schauder/spring-data-benchmarks/blob/main/src/main/java/demo/jdbc/ProductRepository.java) Was it done intentionally? Because it completely changes the query logic:

public interface ProductRepository 
	extends CrudRepository<Product, Integer> {

	@Query("select * from products")
	Product findByName(String name);	
}

After I've removed this annotation

public interface ProductRepository 
	extends CrudRepository<Product, Integer> {

	Product findByName(String name);	
}

and re-run the benchmarks the results are the following:

Benchmark                                       Mode  Cnt      Score     Error  Units
SpringDataJdbcBenchmarking.springDataJdbcQuery  avgt    5  30283.784 ± 643.013  ns/op
SpringDataJpaBenchmarking.springDataJpaQuery    avgt    5   8669.489 ± 410.739  ns/op

So there's again significant gap between Spring Data JDBC and Spring Data JPA execution time.

sergey-morenets avatar Jan 15 '22 10:01 sergey-morenets

Was it done intentionally?

Yes and no. I intentionally added the annotation to demonstrate the effect of not caching the generated query. Changing the query semantics was a mistake on my side.

schauder avatar Jan 17 '22 08:01 schauder

It seems, we also do not properly cache the results of constructing the SQL statement from the method name, resulting in some significant overhead. You can workaround that by providing an explicit query. The missing caching is something we'll fix.

@schauder Is this something to expect in the upcoming version 3.0?

petromir avatar May 30 '22 10:05 petromir

Is there any update on this? In Spring Data R2DBC, we've seen about a 10x performance increase by adding the @Query for this query on a very large table (50+ columns), even with a small list of ids (<10) fun findAllByIdInAndSomethingIsTrue(ids: List<Long>): Flow<MyEntity>

eduanb avatar Jan 23 '23 11:01 eduanb