spring-batch
spring-batch copied to clipboard
JdbcPagingItemReader strange behaviour for record containing square symbol (²) [BATCH-2356]
Driss Amri opened BATCH-2356 and commented
Our application suddenly started throwing unique key constraints after we didn't change the code for months. The behaviour we saw was that all items in our first chunk of records/our first page were being read more than one time using the JdbcPagingItemReader.
The only thing that changed was the data we were reading, which had a strange record containing a character ² (square). When we excluded this record we didn't have any issues at all anymore.
We suddenly started seeing our Spring batch application trying to read the same records/pages twice with the JdbcPagingItemReader.
public PagingQueryProvider queryProvider(DataSource dataSource) {
SqlPagingQueryProviderFactoryBean factory = new SqlPagingQueryProviderFactoryBean();
factory.setDataSource(dataSource);
factory.setDatabaseType("SQLSERVER");
factory.setSelectClause("SELECT projectnr");
factory.setFromClause("FROM (SELECT DISTINCT projectnr FROM AQF_OZP) AQF_OZP");
factory.setWhereClause("WHERE projectnr IS NOT NULL");
factory.setSortKey("projectnr");
try {
return factory.getObject();
} catch (Exception e) {
throw new RuntimeException("Application intialization failed");
}
}
After we changed the query to have a WHERE clause to that didn't include this strange record with square symbol it worked again like it always has:
factory.setWhereClause("WHERE projectnr IS NOT NULL AND projectnr != '²21369'");
There was no exception in the reader phase, only after processing/writer we noticed this since our constraints were being triggered for proccesing the same input.
No further details from BATCH-2356
Michael Minella commented
What database? What encoding? The table definition would also be useful to help debug. However, my gut feeling here is that this isn't an issue with the batch code (since we really don't do anything beyond blindly reading the data) but a db/sql issue...
Driss Amri commented
We are using Microsoft SQL Server 2008 R2, encoding seems to be: SQL_Latin1_General_CP850_CI_AS
I can see the record and the correct name when I'm debugging (breakpoint in the doRead method), so it is being correctly read, but for some strange reason when this String is included in the result set, the reader goes crazy and reads all records more than once.
We process our records parallel but disabled it now to troubleshoot this and the behaviour is same for synchronous and asynchronous processing.
Driss Amri commented
ORDINAL_POSITION COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH
1 Projectnr varchar 12
2 Status smallint <null>
3 Type varchar 4
4 Eigenaar varchar 3
5 Toestand varchar 4
6 GemNR varchar 25
7 DossierNR varchar 25
8 SoortWater varchar 3
9 LabelNaam varchar 50
10 LabelX float <null>
11 LabelY float <null>
12 LabelSize int <null>
13 BovGem varchar 50
14 MI_STYLE varchar 254
15 MI_PRINX int <null>
Thank you for opening the issue. Can you retry with the latest release of Spring Batch(5.0.2) and report back the results?
If the issue is reproducible, can you provide a sample project that uses the latest release of Spring Batch and that exhibits the behavior? To help you in reporting your issue, we have prepared a project template that you can use as a starting point. Please check the Issue Reporting Guidelines for more details about this.