spring-batch icon indicating copy to clipboard operation
spring-batch copied to clipboard

TransactionAwareBufferedWriter adds byte order mark on each chunk [BATCH-1985]

Open spring-projects-issues opened this issue 12 years ago • 3 comments
trafficstars

Jimmy Praet opened BATCH-1985 and commented

When using a TransactionAwareBufferedWriter (FlatFileItemWriter or StaxEventItemWriter) with an encoding that requires a byte order mark (e.g. UTF-16), the byte order mark (BOM) is emitted on each chunk. On each chunk string.getBytes(encoding) is called on the string buffer, which will return a BOM as the first few bytes of the byte array.

The BOM should only be written at the very beginning of the output stream. If a BOM appears anywhere else, it is interpreted as a 'ZERO-WIDTH NON-BREAKING SPACE'.

http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html#standard https://forums.oracle.com/forums/thread.jspa?threadID=2042544 http://www.unicode.org/faq/utf_bom.html


No further details from BATCH-1985

spring-projects-issues avatar Mar 19 '13 12:03 spring-projects-issues

Michael Minella commented

Small point of clarification, the extra bytes are not emitted on each chunk. There seems to be additional bytes added when restarting (appending to a file).

spring-projects-issues avatar Mar 19 '13 12:03 spring-projects-issues

Jimmy Praet commented

I have a test case here: https://github.com/jpraet/spring-batch/commit/a689e25fb27b3f530cc35ff850fd2d01e22bd2ae and I'm seeing the BOM being emitted on each chunk, which makes sense because the buffer is cleared on each chunk.

I have found the following encodings affected by this bug:

  • UTF-16
  • x-UTF-32BE-BOM
  • x-UTF-32LE-BOM
  • UnicodeBig
  • UnicodeLittle

spring-projects-issues avatar Mar 20 '13 13:03 spring-projects-issues

Thank you for opening the issue. Can you retry with the latest release of Spring Batch(5.0.2) and report back the results?

cppwfs avatar Jul 20 '23 12:07 cppwfs