spring-batch
spring-batch copied to clipboard
JpaCursorItemReader read all rows in memory even though the cursor batch size is set
Bug description
I am trying to read a large amount of data from postgresql using JpaCursorItemReader.
The postgresql driver basically reads all rows when using a cursor, so there is an OutOfMemoryError issue in this case.
https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters
I set the fetch size supported by the postgresql driver, but it didn't work.
I also set jdbc fetch size supported by hibernate, but it didn't work as intended. https://vladmihalcea.com/resultset-statement-fetching-with-jdbc-and-hibernate/
However, if I changed to JdbcCursorItemReader instead of JpaCursorItemReader , both settings above worked as intended.
Environment
- java: java 11
- postgresql: 12.6
- spring-boot: 2.5.6
- spring-boot-starter-batch: 2.5.6
- spring-boot-starter-data-jpa: 2.5.6
Steps to reproduce
## Set hibernate fetch size
spring:
jpa:
properties:
hibernate:
jdbc:
fetch_size: 100
## Or jdbc driver fetch size
spring:
datasource:
hikari:
pool_name: primaryPool
driver-class-name: "org.postgresql.Driver"
dataSourceProperties:
defaultRowFetchSize: 100
@Bean
fun reader(): ItemReader<TestEntity> {
return JpaCursorItemReaderBuilder<TestEntity>()
.entityManagerFactory(entityManagerFactory)
.queryString(
"""
SELECT c FROM com.test.TestEntity c
"""
)
.build()
}
Below is a snapshot screen of JVM memory in real situation.
And the query is still waiting to read all the data.

Expected behavior
If fetchSize is set like JdbcCursorItemReader, data as much as fetchSize should be read from the cursor.
Most likely we won't notice unless it's a large amount of data.
After setting yaml in the same way as above, data is read normally as much as fetchSize without setting fetchSize in JdbcCursorItemReader.
override fun getReader(): ItemReader<TestEntity> {
return JdbcCursorItemReaderBuilder<TestEntity>()
// .fetchSize(100)
.dataSource(datasource)
.rowMapper(BeanPropertyRowMapper(TestEntity::class.java))
.sql("SELECT id FROM test")
.build()
}
I want JpaCursorItemReader to behave the same as the jdbc driver setting and hiberhante fetch size setting of JdbcCursorItemReader .
I suspect the error here is your configuration. You are returning ItemReader<TestEntity> which is just an ItemReader. Where you should be returning an ItemStreamReader<TestEntity> or JpaCursorItemReaderBuilder<TestEntity>. That way the configuration knows (and Spring Batch in the end) that there is an update method that needs to be called which will call entityManager.clear.
The clear isn't called leading to entities still being in the first level cache. It will just read 100 but due to the nature of JPA one needs to clear the first level cache in between reads.
As a rule of thumb in your @Configuration classes be as specific in your return types as possible as that return type is also used to determine what needs to be done to a bean. It will inspect the return type at moments in the lifecycle instead of the actual type of the constructed bean.
@mdeinum
Thanks for your comment.
I've already tested the method (ItemStreamReader, JpaCursorItemReader) you mentioned, but it didn't work.
It seems to be due to a JPA nature, but I haven't found the cause yet. Looks like I'll have to check some more.
There is a call to clear in the ItemReader which will only be called on an update of the ItemStreamReader. If that doesn't happen the entities remain in your 1st level cache. Another thing is the same applies to the writer, you should clear after the list else those will remain in the 1st level cache.
Thank you @han1448 for opening this issue and thank you @mdeinum for the follow up!
As Marten said, it is expected to make the bean definition method return the most specific type so that Spring Batch correctly creates a proxy that implements the ItemStream interface, and therefore honors its contract by calling open, update and close methods as expected, which in the case of the JpaCursorItemReader would clear the first level cache of items.
The documentation was updated in v4.3.7 with more details about this. Please check Scoping ItemStream components.
I've already tested the method (ItemStreamReader, JpaCursorItemReader) you mentioned, but it didn't work. It seems to be due to a JPA nature, but I haven't found the cause yet. Looks like I'll have to check some more.
Did you find the cause of this issue since then? If not, and things do not work as expected even when making the return type as JpaCursorItemReader, please provide a minimal example that we can debug to reproduce the issue and we will dig deeper. Thank you upfront.
@fmbenhassine
Thank you for your comment.
Now I'm using JdbcCursorItemReader in my production but I'll try to provide a publicly available sample.
@fmbenhassine I've test my code, it seems like issue in my side. Sorry for the inconvenience.