flink
flink copied to clipboard
[FLINK-33198][Format/Avro] support local timezone timestamp logic type in AVRO
What is the purpose of the change
From the latest Avro Spec, local-timestamp-millis and local-timestamp-micros logical types are added. This PR is to support Local timezone logic type in Avro related classes.
Brief change log
- Add logic to handle TIMESTAMP_WITH_LOCAL_TIME_ZONE in AvroToRowDataConverters
- Add logic to handle TIMESTAMP_WITH_LOCAL_TIME_ZONE in RowDataToAvroConverters
- Add logic to handle TIMESTAMP_WITH_LOCAL_TIME_ZONE in legacy AvroRow Ser and Des Schema
- Add new fields for test cases
Verifying this change
Extended existing test cases for Avro
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): (no) - The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
- The S3 file system connector: (no)
Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
CI report:
- 411eb7294db31ee61c33ab84ce39eef994625575 Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@flinkbot run azure
@flinkbot run azure
@flinkbot run azure
Thank you @HuangZhenQiu a lot for contributing this.
After reading the Avro spec, I think we have wrongly mapped the Avro timestamp.
Avro spec says:
Timestamp (millisecond precision) The timestamp-millis logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one millisecond. Please note that time zone information gets lost in this process. Upon reading a value back, we can only reconstruct the instant, but not the original representation. In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment. A timestamp-millis logical type annotates an Avro long, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000 UTC.
Consistent timestamp types in Hadoop SQL engines also pointed out:
Timestamps in Avro, Parquet and RCFiles with a binary SerDe have Instant semantics
So Avro Timestamp is a Java Instant semantic that should map to Flink TIMESTAMP_LTZ, but currently, it maps to TIMESTAMP_NTZ.
On the contrary,
Local timestamp (millisecond precision) The local-timestamp-millis logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one millisecond. A local-timestamp-millis logical type annotates an Avro long, where the long stores the number of milliseconds, from 1 January 1970 00:00:00.000.
Avro LocalTimestamp is a Java LocalDateTime semantic that should map to Flink TIMESTAMP_NTZ.
If we agree with this behavior, we may need to open a discussion in the dev ML about how to correct the behavior in a backward-compatible or incompatible way.
@wuchong Thanks for the feedback according to the hadoop alignment doc. Beside this, I also feel unclear on how to converting timestamp data to TimestampData which is the RowData internal representation. A Flink user can define a dynamic table with Avro format on a timestamp field with ZonedTimestampType, but we we can't convert the Avro long typed data to the target timestamp with time zone as the target Flink type is missing in Converters. I would like to open a discussion in dev ML after our offline sync.
@flinkbot run azure
@flinkbot run azure
Hi @PatrickRen @leonardBang , do you have time to help review this PR?
Hi, @HuangZhenQiu Would you like to rebase the PR to the latest master branch ?
@flinkbot run azure
@leonardBang Thanks for providing valuable comments. I have refactor the code to resolve your comments. Another following PR will be created to change precision. Thanks!
@flinkbot run azure
@leonardBang Thanks for the comprehensive review. It not help me dig out the bug, and also make the feature easy to use by adding more context in variable name, flag description and comments.
@leonardBang Rebased master. Let's see the CI result.
@flinkbot run azure
Hit a code style issue which is fixed in the latest master. Rebase master again.
@flinkbot run azure