mysql-binlog-connector-java
mysql-binlog-connector-java copied to clipboard
memory leak into WriteRowsEventDataDeserializer.deserializeRows
Env: We are using StreamSets to ingest data from mysql 5.7's binlog to kafka ,it uses mysql-binlog-connector-java version 0.23.3 , and Streamsets set batch process, each batch is 300. and mysql database charset is utf-8. Issue: every 3-5days, it occurred memory leak. Analyze: using MAT to analyze,I found that WriteRowsEventDataDeserializer.deserializeRows 's local var List<Serializable[]> has more than 2 millions objects ,total size were more than 7GB ,so FullGC happened, but these are live objects, cannot revoke memory. When the issue occurred , the ByteArrayInputStream size is 524288 bytes,but below code did 2 millions loops . while (inputStream.available() > 0) { result.add(deserializeRow(tableId, includedColumns, inputStream)); }
See below MAT pictures:

meanwhile , I found ByteArrayInputStream content has Garbled characters , some columns of data rows has Chinese characters。
ByteArrayInputStream content is below:
when open it as charset ANSI:

when open it as charset UTF-8:

Question: does this issue caused by Chinese character in the InputStream ? how to fix this issue?