mysql-binlog-connector-java
mysql-binlog-connector-java copied to clipboard
Deserialization trouble with BINARY(16) columns for UUID
Library version: 0.27.5 This UUID string value ab9d25e0-64e6-469f-9873-8621910ed300 came from: java.util.UUID.randomUUID().toString() (JDK version 11)
It was persisted to a MySQL 8.0 database column of type BINARY(16) using the UUID_TO_BIN() function. Here is the relevant code relying on the binlog library:
WriteRowsEventData eventData = event.getData();
List<Serializable[]> rowData = eventData.getRows();
(iterate over rowData)
Object obj = (value from UUID/BINARY column which will come back as a byte[])
For the vast majority of UUIDs, the byte[] is of the expected length - 16. But some, like the value included above, have a length of 15.
The following conversion will fail with only 15 bytes:
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
long high = byteBuffer.getLong();
long low = byteBuffer.getLong();
java.util.UUID uuid = new UUID(high, low);
The exception is: java.nio.BufferUnderflowException at java.base/java.nio.Buffer.nextGetIndex(Buffer.java:650) at java.base/java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:452)
If the incorrectly sized byte[] has a 16th byte added as follows:
bin2[15] = Byte.valueOf("00");
The conversion works.
I should also state that interacting with the persisted UUID using BIN_TO_UUID works as expected (i.e there is no problem from MySQL's perspective).
BTW, the CHAR_AND_BINARY_AS_BYTE_ARRAY compatibility mode is being set as well:
EventDeserializer eventDeserializer = new EventDeserializer();
eventDeserializer.setCompatibilityMode(
EventDeserializer.CompatibilityMode.CHAR_AND_BINARY_AS_BYTE_ARRAY
);
binaryLogClient.setEventDeserializer(eventDeserializer);
Hi, @bobdaly. I've also encountered this bug when deserializing BINARY(16)
to UUID.
cc @osheroff, @Naros, @jpechane.
MySQL column definition: uuid BINARY(16) NOT NULL UNIQUE KEY,
java.lang.IllegalArgumentException: UUID bytes len: 15, expected 16
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at xxx.db.Utils.convertBytesToUUID(Utils.java:11)
at xxx.db.ClickHouseDAO.parseUser(ClickHouseDAO.java:245)
at xxx.db.ClickHouseDAO.insertRows(ClickHouseDAO.java:327)
at xxx.db.ClickHouseWorker$QueueThread.insertAndAck(ClickHouseWorker.java:73)
at xxx.db.ClickHouseWorker$QueueThread.run(ClickHouseWorker.java:62)
public static UUID convertBytesToUUID(byte[] bytes) {
Preconditions.checkNotNull(bytes);
Preconditions.checkArgument(bytes.length == 16, "UUID bytes len: " + bytes.length + ", expected 16");
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
long high = byteBuffer.getLong();
long low = byteBuffer.getLong();
return new UUID(high, low);
}
EventDeserializer eventDeserializer = new EventDeserializer();
eventDeserializer.setCompatibilityMode(
EventDeserializer.CompatibilityMode.DATE_AND_TIME_AS_LONG,
EventDeserializer.CompatibilityMode.CHAR_AND_BINARY_AS_BYTE_ARRAY,
EventDeserializer.CompatibilityMode.INVALID_DATE_AND_TIME_AS_MIN_VALUE
);
client.setEventDeserializer(eventDeserializer);
related: https://github.com/osheroff/mysql-binlog-connector-java/issues/131
2024-09-19 15:18:21.756 ERROR xxx.cli.App: Uncaught exception on Thread-9
java.lang.IllegalArgumentException: UUID bytes len: 15: fdf085fa10f93ce087a45c5f285809, expected 16
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
at xxx.db.Utils.convertBytesToUUID(Utils.java:23)
at xxx.db.ClickHouseDAO.parseUser(ClickHouseDAO.java:245)
at xxx.db.ClickHouseDAO.insertRows(ClickHouseDAO.java:327)
at xxx.db.ClickHouseWorker$QueueThread.insertAndAck(ClickHouseWorker.java:73)
at xxx.db.ClickHouseWorker$QueueThread.run(ClickHouseWorker.java:62)
mysql> select lower(hex(uuid)) as uuid from xxx.xxx where lower(hex(uuid)) like '%fdf085fa10f93ce087a45c5f285809%';
+----------------------------------+
| uuid |
+----------------------------------+
| fdf085fa10f93ce087a45c5f28580900 |
+----------------------------------+
1 row in set (0.01 sec)
I.e. the trailing 00
-byte got "unread" somehow.
Hi @leiless can you tell me what version of MySQL/MariaDB you're using and version of the binlog client?
@Naros, FYI, 10.6.10-MariaDB-log
.
Ok, I've figured it out. It's the behavior of MySQL replication protocol.
The MySQL server indeed sent 15 bytes to the MySQL replica, it's not a bug.
# All 0x00s case
# {'values': {'UNKNOWN_COL0': 1, 'UNKNOWN_COL1': ''}, 'none_sources': {}}
# UNKNOWN_COL1 len: 0
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("00000000-0000-0000-0000-000000000000", "-","")));
# Suffix 0x00s case
# {'values': {'UNKNOWN_COL0': 2, 'UNKNOWN_COL1': '\x01'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 1
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("01000000-0000-0000-0000-000000000000", "-","")));
# Mid 0x00s case
# {'values': {'UNKNOWN_COL0': 3, 'UNKNOWN_COL1': '\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 15
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("01000000-0000-0000-0000-000000000100", "-","")));
# Prefix 0x00s case
# {'values': {'UNKNOWN_COL0': 4, 'UNKNOWN_COL1': '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 15
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("00000000-0000-0000-0000-000000000100", "-","")));
# Prefix + suffix 0x00s case (suffix 0x00s got truncated)
# {'values': {'UNKNOWN_COL0': 5, 'UNKNOWN_COL1': '\x00\x00\x00\x00\x00\x00\x01\x01'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 8
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("00000000-0000-0101-0000-000000000000", "-","")));
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import WriteRowsEvent
mysql_settings = {'host': '127.0.0.1', 'port': 23306, 'user': 'root', 'passwd': '123456'}
stream = BinLogStreamReader(connection_settings=mysql_settings,
server_id=100,
is_mariadb=True,
auto_position='',
blocking=True,
)
for binlogevent in stream:
binlogevent.dump()
if isinstance(binlogevent, WriteRowsEvent):
write_rows_ev: WriteRowsEvent = binlogevent
if f'{write_rows_ev.schema}.{write_rows_ev.table}' == 'db.t':
for row in write_rows_ev.rows:
print(row)
print('UNKNOWN_COL1 len: ' + str(len(row['values']['UNKNOWN_COL1'])))
print()
stream.close()
As we can see, for BINARY(N)
where N
is a constant, if there are trailing 0x00
s in the value, those trailing 0x00
bytes will be truncated.
Thanks for the follow-up @leiless, so in that case can we close this or is there still an issue?
As we can see, for
BINARY(N)
whereN
is a constant, if there are trailing0x00
s in the value, those trailing0x00
bytes will be truncated.
Proof:
# {'values': {'UNKNOWN_COL0': 17, 'UNKNOWN_COL1': '\x01\x02\x03\x04\x05\x06\x07\x08\t\n'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 10
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("01020304-0506-0708-090a-000000000000", "-","")));
MariaDB server send BINARY(16)
data 01020304-05060708090a000000000000
with trailing 0x00
s stripped.
I think it applies to MySQL also.
And for CHAR(N)
where N
is a constant, trailing 0x00
s will also be stripped (not yet tested).
Thanks for the follow-up @leiless, so in that case can we close this or is there still an issue?
@Naros I think we can close this issue, but we may need to update the README for this, it's quite tricky.
Possible solution to this issue
private static byte[] fixTruncatedUuidBytes(byte[] uuidBytes) {
Preconditions.checkNotNull(uuidBytes);
Preconditions.checkArgument(uuidBytes.length <= 16);
int truncatedTrailingZerosLen = 16 - uuidBytes.length;
if (truncatedTrailingZerosLen == 0) {
return uuidBytes;
}
byte[] fixedUuidBytes = new byte[16];
// The remaining truncatedTrailingZerosLen bytes are already initialized to 0 by default
System.arraycopy(uuidBytes, 0, fixedUuidBytes, 0, uuidBytes.length);
return fixedUuidBytes;
}
@leiless Hi, I am a bit confused now. The intent of the code you posted is to provide a hint for clients to how to propely consume the value? If yes, is this something you'd like to add to the README? Also, WRT README update would it be possible to send a PR to https://github.com/debezium/mysql-binlog-connector-java ? Thanks
@leiless Hi, I am a bit confused now. The intent of the code you posted is to provide a hint for clients to how to propely consume the value? If yes, is this something you'd like to add to the README? Also, WRT README update would it be possible to send a PR to https://github.com/debezium/mysql-binlog-connector-java ? Thanks
The intent of the code you posted is to provide a hint for clients to how to propely consume the value?
Yes, in this case, deserialize BINARY(16)
as byte[]
(which trailing 0x00
s is truncated and length of the byte[]
is less than 16 bytes), and eventually turn it into a UUID.
So we must fill extra 0x00
s to make the length 16 bytes.
I'll send a PR to https://github.com/debezium/mysql-binlog-connector-java someday later, pretty bz right now.