flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-33198][Format/Avro] support local timezone timestamp logic type in AVRO

Open HuangZhenQiu opened this issue 2 years ago • 10 comments

What is the purpose of the change

From the latest Avro Spec, local-timestamp-millis and local-timestamp-micros logical types are added. This PR is to support Local timezone logic type in Avro related classes.

Brief change log

  • Add logic to handle TIMESTAMP_WITH_LOCAL_TIME_ZONE in AvroToRowDataConverters
  • Add logic to handle TIMESTAMP_WITH_LOCAL_TIME_ZONE in RowDataToAvroConverters
  • Add logic to handle TIMESTAMP_WITH_LOCAL_TIME_ZONE in legacy AvroRow Ser and Des Schema
  • Add new fields for test cases

Verifying this change

Extended existing test cases for Avro

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

HuangZhenQiu avatar Oct 11 '23 16:10 HuangZhenQiu

CI report:

  • 411eb7294db31ee61c33ab84ce39eef994625575 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Oct 11 '23 16:10 flinkbot

@flinkbot run azure

HuangZhenQiu avatar Oct 11 '23 16:10 HuangZhenQiu

@flinkbot run azure

JingsongLi avatar Oct 12 '23 02:10 JingsongLi

@flinkbot run azure

HuangZhenQiu avatar Oct 12 '23 02:10 HuangZhenQiu

Thank you @HuangZhenQiu a lot for contributing this.

After reading the Avro spec, I think we have wrongly mapped the Avro timestamp.

Avro spec says:

Timestamp (millisecond precision) The timestamp-millis logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one millisecond. Please note that time zone information gets lost in this process. Upon reading a value back, we can only reconstruct the instant, but not the original representation. In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment. A timestamp-millis logical type annotates an Avro long, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000 UTC.

Consistent timestamp types in Hadoop SQL engines also pointed out:

Timestamps in Avro, Parquet and RCFiles with a binary SerDe have Instant semantics

So Avro Timestamp is a Java Instant semantic that should map to Flink TIMESTAMP_LTZ, but currently, it maps to TIMESTAMP_NTZ.

On the contrary,

Local timestamp (millisecond precision) The local-timestamp-millis logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one millisecond. A local-timestamp-millis logical type annotates an Avro long, where the long stores the number of milliseconds, from 1 January 1970 00:00:00.000.

Avro LocalTimestamp is a Java LocalDateTime semantic that should map to Flink TIMESTAMP_NTZ.

If we agree with this behavior, we may need to open a discussion in the dev ML about how to correct the behavior in a backward-compatible or incompatible way.

@wuchong Thanks for the feedback according to the hadoop alignment doc. Beside this, I also feel unclear on how to converting timestamp data to TimestampData which is the RowData internal representation. A Flink user can define a dynamic table with Avro format on a timestamp field with ZonedTimestampType, but we we can't convert the Avro long typed data to the target timestamp with time zone as the target Flink type is missing in Converters. I would like to open a discussion in dev ML after our offline sync.

HuangZhenQiu avatar Oct 16 '23 02:10 HuangZhenQiu

@flinkbot run azure

HuangZhenQiu avatar Oct 18 '23 06:10 HuangZhenQiu

@flinkbot run azure

HuangZhenQiu avatar Nov 17 '23 03:11 HuangZhenQiu

Hi @PatrickRen @leonardBang , do you have time to help review this PR?

wuchong avatar Dec 19 '23 03:12 wuchong

Hi, @HuangZhenQiu Would you like to rebase the PR to the latest master branch ?

leonardBang avatar Dec 19 '23 04:12 leonardBang

@flinkbot run azure

HuangZhenQiu avatar Jan 02 '24 03:01 HuangZhenQiu

@leonardBang Thanks for providing valuable comments. I have refactor the code to resolve your comments. Another following PR will be created to change precision. Thanks!

HuangZhenQiu avatar Jan 22 '24 05:01 HuangZhenQiu

@flinkbot run azure

HuangZhenQiu avatar Jan 22 '24 05:01 HuangZhenQiu

@leonardBang Thanks for the comprehensive review. It not help me dig out the bug, and also make the feature easy to use by adding more context in variable name, flag description and comments.

HuangZhenQiu avatar Jan 22 '24 09:01 HuangZhenQiu

@leonardBang Rebased master. Let's see the CI result.

HuangZhenQiu avatar Jan 23 '24 15:01 HuangZhenQiu

@flinkbot run azure

HuangZhenQiu avatar Jan 23 '24 15:01 HuangZhenQiu

Hit a code style issue which is fixed in the latest master. Rebase master again.

HuangZhenQiu avatar Jan 23 '24 17:01 HuangZhenQiu

@flinkbot run azure

HuangZhenQiu avatar Jan 23 '24 17:01 HuangZhenQiu