paimon icon indicating copy to clipboard operation
paimon copied to clipboard

自动创建Tags的Bug

Open Hi-luca-Gao opened this issue 1 year ago • 1 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Paimon version

版本0.8

Compute Engine

flink

Minimal reproduce step

test表结构: tEnv.executeSql("CREATE TABLE dim_log_2 (\n" + " url STRING,\n" + " ts BIGINT,\n" + " color STRING,\n" + " PRIMARY KEY (url) NOT ENFORCED" + " ) WITH (\n" + " 'merge-engine' = 'partial-update',\n" + " 'changelog-producer' = 'input',\n" + " 'tag.automatic-creation' = 'watermark',\n" + " 'tag.creation-period' = 'daily',\n" + " 'tag.num-retained-max' = '90'" + ");"); 使用 WatermarkStrategy .<>forBoundedOutOfOrderness(Duration.ofSeconds()).withTimestampAssigner(SerializableTimestampAssigner)

第一次数据处理,直接会触发paimon tryToCreateTags()方法,此时的watermark为-9223372036854775808,因为此策略watermark默认200ms下发一次,这样会导致 this.periodHandler.normalizeToPreviousTag(time);这个方法构建的返回值为+1705471-09-26,因此tagName命名成为tag-1705471-09-26,后续会因此不再有正常的例如2024-05-17这种正常事件的数据触发自动创建Tags,因为1705471-09-26这个日期过于大。

此为问题代码: //实际是因为Timestamp.fromEpochMillis上限导致这里计算错误,但是根本问题在于watermark的处理没有考虑到第一次由event去触发创建目录的时候,watermark还没有来得及更新。 public LocalDateTime normalizeToPreviousTag(LocalDateTime time) { long mills = Timestamp.fromLocalDateTime(time).getMillisecond(); long periodMills = this.onePeriod().toMillis(); //此处导致错误 LocalDateTime normalized = Timestamp.fromEpochMillis(mills / periodMills * periodMills).toLocalDateTime(); return normalized.minus(this.onePeriod()); }

What doesn't meet your expectations?

希望paimon越来越好,paimon是数据集成的未来, 加油!!!! 希望有机会可以提供一些向你们学习的机会。

Anything else?

No response

Are you willing to submit a PR?

  • [X] I'm willing to submit a PR!

Hi-luca-Gao avatar May 17 '24 10:05 Hi-luca-Gao

Thanks @Hi-luca-Gao for reporting. Can you use English?

I think we should ignore Long.MIN there.

JingsongLi avatar May 20 '24 14:05 JingsongLi

You're welcome. @JingsongLi Yes

Hi-luca-Gao avatar May 21 '24 01:05 Hi-luca-Gao

@JingsongLi
image

I think this kind of repair will have unexpected dangers, so it should be watermark == Long.MIN. A timestamp less than 00:00:00 on January 1, 1970 Greenwich Mean Time is a negative number. This is a normal phenomenon。 for example: -315585870 ===>1960-01-01T10:15:30+01:00

Hi-luca-Gao avatar Jun 04 '24 06:06 Hi-luca-Gao