trino
trino copied to clipboard
Add support for date, time_millis, and timestamp_millis to AvroColumnDecoder
Is this change a fix, improvement, new feature, refactoring, or other?
Improvement.
Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
Avro part of trino-record-decoder.
How would you describe this change to a non-technical end user or system administrator?
Support parsing trino's time types in Avro column decoder.
Related issues, pull requests, and links
- Fixes #13069.
From Avro's spec, actually all of these types should have either int
or long
in "type" field with a "logicalType" field distinguish.
So to support these types in AvroColumnDecoder aspect, simply add them to supported types list is enough and the actual parsing logic will be handled in caller side for example PulsarAvroRowDecoderFactory
.
Documentation
(x) No documentation is needed. ( ) Sufficient documentation is included in this PR. ( ) Documentation PR is available with #prnumber. ( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
This is almost an internal change.
( ) Release notes entries required with the following suggested text:
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.
I have submitted CLA to [email protected]. Please check.
cc @kokosing I think I've sent the CLA already but the test still failed. Did I miss something here?
@tisonkun The registering process takes a few days. We will ping cla-bot once it's registered.
@ebyhr two weeks passed. May I have an estimate about your team's reviewing?
When I working on https://github.com/apache/pulsar/pull/16683, I notice that after Trino supports parametric timestamp, the decode case should be review in another round.
Related to:
- https://github.com/trinodb/trino/commit/21e3ddf55a641e1c17b7e4a17689223ff1120b42
- https://github.com/trinodb/trino/commit/19811d3d58e908d145639b1975ec111a0002e138
@cla-bot check
The cla-bot has been summoned, and re-checked this pull request!
ping @ebyhr @Praveen2112 @hashhar
@hashhar I don't get your point. This patch follow the Avro spec and adapt to Trino internal representation design.
Trino's TimestampType
always store the payload in epochMicros
precision.
And yes, it breaks downstream adoption as in https://github.com/apache/pulsar/pull/16683/commits/c4183950c60ab1648ec94ff0703944fb59c734d7.
This is introduced by how we implement parametric timestamp type https://github.com/trinodb/trino/commit/21e3ddf55a641e1c17b7e4a17689223ff1120b42, and out of the scope of this patch.
But for other Avro readers to be able to get a timestamp value without manually decoding it the Avro schema needs to annotate the field with a logical type.
@hashhar You can read this commit https://github.com/trinodb/trino/pull/13070/commits/52f4cf28912f5e2d5e4933464a0a6a39a04e1bfa that I revert trino-kafka adapting Avro logical type.
Still I don't know what changes you proposed here. It follows Avro spec, schema factory can infer Trino's type from Avro type and logical type, but that's the responsibility of downstream plugins.
I'm a bit confused about this now. I probably need to re-read existing code more thoroughly and re-review later. In meantime I'll defer to @Praveen2112 .
ping @ebyhr @hashhar @Praveen2112
@Praveen2112 thanks for your reivews! Updated and squashed.
@Praveen2112
but ideally it should happen as a part of Avro library and it does support it.
You can directly send a patch onto this one for your suggestions, or at least point out what APIs provided by Avro are related to implementing it.
:wave: @tisonkun @Praveen2112 @hashhar - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.
We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.
No longer work on this.