mysql-binlog-connector-java
mysql-binlog-connector-java copied to clipboard
Binary(16) field with trailing 0 in bytes is truncated
We are observing that binary(16) fields appear to be truncated when they have trailing 0 bytes (i.e. null terminated) in them. A detailed write up is here:
https://issues.jboss.org/browse/DBZ-254
We are on Debezium 0.4.0, which seems to use 0.9.0 of the mysql-binlog-connector-java library.
@criccomini confirmed. Problem identified at https://github.com/shyiko/mysql-binlog-connector-java/blob/master/src/main/java/com/github/shyiko/mysql/binlog/event/deserialization/AbstractRowsEventDataDeserializer.java#L362 (stringLength == 15). I'll update the ticket as soon as I have a fix.
@criccomini it looks like I'm not gonna be able to fix this on the mysql-binlog-connector-java side.
BINARY data is zero-padded (see https://dev.mysql.com/doc/refman/5.7/en/binary-varbinary.html). As you probably know, binary log does not contain information about the types (there is no way to distinguish between BINARY and CHAR for example) and so there is just not enough information to zero-pad it automatically. Debezium (considering that it knows the precise column types) will have to zero-pad the values on its own (whenever the value is shorter then expected).
I'll keep ticket open until I update the readme.md (quirks section).
Got it. Thanks for looking into this!
Debezium (considering that it knows the precise column types) will have to zero-pad the values on its own (whenever the value is shorter then expected).
Will this work, though? If I INSERT a\0 and INSERT a\0\0 into a BINARY(3), I don't think DBZ can tell the difference no matter what, right? This doesn't seem fixable to me.
@criccomini I guess it depends on what comes back from the MySQL binlog event. For example, given a column of type BINARY(3) and a value of a, is the length as read by the binlog connector really 1 or 3? If 1 then here couldn't we detect the difference in actual vs column length and right-pad the byte[]? It's not ideal, but it should work. Any chance you have debugged the code?
If 1 then here couldn't we detect the difference in actual vs column length and right-pad the byte[]?
But if it's 1, wouldn't you not know whether a was inserted or a\0 or a\0\0?
It's going to be 1 in both cases (a\0 and a\0\0) (right-padding is stripped completely). Personally I think the right thing to do here would be to use proper data type - varbinary(16) instead of binary(16) (\0s are NOT stripped away in case of var*s).
@shyiko if that's the case, I think it's actually a pretty fair argument to make DBZ assume that a fixed-length BINARY column should include the right padded 0's. Basically, my take away is if you're using fixed-length BINARY cols in MySQL, you must be writing binary values that are ONLY that length (or are following a VERY predictable write pattern). Otherwise, you just can't use this data type predictably.
@rhauch what do you think about making DBZ assume, and force right padded 0s?