flink-cdc icon indicating copy to clipboard operation
flink-cdc copied to clipboard

[FLINK-35615][base] Add a workaround for timestamp binary en/decoding failure with precisions mismatch

Open yuxiqian opened this issue 1 year ago • 0 comments

This is a workaround for FLINK-35615.

Currently, MySQL source could not obtain the precise timestamp precision of data field types, and used a intuitive method to infer it by checking data record precision as follows:

// DebeziumSchemaDataTypeInference.java
int precision;
if (nano == 0) {
    precision = 0;
} else if (nano % 1000 > 0) {
    precision = 9;
} else if (nano % 1000_000 > 0) {
    precision = 6;
} else if (nano % 1000_000_000 > 0) {
    precision = 3;
} else {
    precision = 0;
}

Thus, the precision of timestamp might be underestimated. This would cause a binary encoding / decoding issue, since CDC will store timestamps with <= 3 precision in compact, and allocate extra memory segments for high-precision timestamps. So, timestamps with mismatched precision would cause OOB memory access.

This PR disables such low-precision timestamp storage optimization for timestamps to ensure binary records always could be correctly decoded. The cost is now each timestamp records requires 12 bytes instead of 8 bytes.


cc @Jiabao-Sun @loserwang1024

yuxiqian avatar Jul 01 '24 10:07 yuxiqian