自动创建Tags的Bug
Search before asking
- [X] I searched in the issues and found nothing similar.
Paimon version
版本0.8
Compute Engine
flink
Minimal reproduce step
test表结构: tEnv.executeSql("CREATE TABLE dim_log_2 (\n" + " url STRING,\n" + " ts BIGINT,\n" + " color STRING,\n" + " PRIMARY KEY (url) NOT ENFORCED" + " ) WITH (\n" + " 'merge-engine' = 'partial-update',\n" + " 'changelog-producer' = 'input',\n" + " 'tag.automatic-creation' = 'watermark',\n" + " 'tag.creation-period' = 'daily',\n" + " 'tag.num-retained-max' = '90'" + ");"); 使用 WatermarkStrategy .<>forBoundedOutOfOrderness(Duration.ofSeconds()).withTimestampAssigner(SerializableTimestampAssigner)
第一次数据处理,直接会触发paimon tryToCreateTags()方法,此时的watermark为-9223372036854775808,因为此策略watermark默认200ms下发一次,这样会导致 this.periodHandler.normalizeToPreviousTag(time);这个方法构建的返回值为+1705471-09-26,因此tagName命名成为tag-1705471-09-26,后续会因此不再有正常的例如2024-05-17这种正常事件的数据触发自动创建Tags,因为1705471-09-26这个日期过于大。
此为问题代码: //实际是因为Timestamp.fromEpochMillis上限导致这里计算错误,但是根本问题在于watermark的处理没有考虑到第一次由event去触发创建目录的时候,watermark还没有来得及更新。 public LocalDateTime normalizeToPreviousTag(LocalDateTime time) { long mills = Timestamp.fromLocalDateTime(time).getMillisecond(); long periodMills = this.onePeriod().toMillis(); //此处导致错误 LocalDateTime normalized = Timestamp.fromEpochMillis(mills / periodMills * periodMills).toLocalDateTime(); return normalized.minus(this.onePeriod()); }
What doesn't meet your expectations?
希望paimon越来越好,paimon是数据集成的未来, 加油!!!! 希望有机会可以提供一些向你们学习的机会。
Anything else?
No response
Are you willing to submit a PR?
- [X] I'm willing to submit a PR!
Thanks @Hi-luca-Gao for reporting. Can you use English?
I think we should ignore Long.MIN there.
You're welcome. @JingsongLi Yes
@JingsongLi
I think this kind of repair will have unexpected dangers, so it should be watermark == Long.MIN. A timestamp less than 00:00:00 on January 1, 1970 Greenwich Mean Time is a negative number. This is a normal phenomenon。 for example: -315585870 ===>1960-01-01T10:15:30+01:00