trino icon indicating copy to clipboard operation
trino copied to clipboard

Add support for date, time_millis, and timestamp_millis to AvroColumnDecoder

Open tisonkun opened this issue 2 years ago • 17 comments

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Avro part of trino-record-decoder.

How would you describe this change to a non-technical end user or system administrator?

Support parsing trino's time types in Avro column decoder.

Related issues, pull requests, and links

  • Fixes #13069.

From Avro's spec, actually all of these types should have either int or long in "type" field with a "logicalType" field distinguish.

So to support these types in AvroColumnDecoder aspect, simply add them to supported types list is enough and the actual parsing logic will be handled in caller side for example PulsarAvroRowDecoderFactory.

Documentation

(x) No documentation is needed. ( ) Sufficient documentation is included in this PR. ( ) Documentation PR is available with #prnumber. ( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.

This is almost an internal change.

( ) Release notes entries required with the following suggested text:

tisonkun avatar Jul 03 '22 13:07 tisonkun

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

cla-bot[bot] avatar Jul 03 '22 13:07 cla-bot[bot]

I have submitted CLA to [email protected]. Please check.

tisonkun avatar Jul 03 '22 13:07 tisonkun

cc @kokosing I think I've sent the CLA already but the test still failed. Did I miss something here?

tisonkun avatar Jul 08 '22 01:07 tisonkun

@tisonkun The registering process takes a few days. We will ping cla-bot once it's registered.

ebyhr avatar Jul 08 '22 06:07 ebyhr

@ebyhr two weeks passed. May I have an estimate about your team's reviewing?

tisonkun avatar Jul 19 '22 00:07 tisonkun

When I working on https://github.com/apache/pulsar/pull/16683, I notice that after Trino supports parametric timestamp, the decode case should be review in another round.

Related to:

  • https://github.com/trinodb/trino/commit/21e3ddf55a641e1c17b7e4a17689223ff1120b42
  • https://github.com/trinodb/trino/commit/19811d3d58e908d145639b1975ec111a0002e138

tisonkun avatar Jul 20 '22 23:07 tisonkun

@cla-bot check

ebyhr avatar Jul 21 '22 08:07 ebyhr

The cla-bot has been summoned, and re-checked this pull request!

cla-bot[bot] avatar Jul 21 '22 08:07 cla-bot[bot]

ping @ebyhr @Praveen2112 @hashhar

tisonkun avatar Jul 25 '22 03:07 tisonkun

@hashhar I don't get your point. This patch follow the Avro spec and adapt to Trino internal representation design.

Trino's TimestampType always store the payload in epochMicros precision.

tisonkun avatar Jul 29 '22 06:07 tisonkun

And yes, it breaks downstream adoption as in https://github.com/apache/pulsar/pull/16683/commits/c4183950c60ab1648ec94ff0703944fb59c734d7.

This is introduced by how we implement parametric timestamp type https://github.com/trinodb/trino/commit/21e3ddf55a641e1c17b7e4a17689223ff1120b42, and out of the scope of this patch.

tisonkun avatar Jul 29 '22 06:07 tisonkun

But for other Avro readers to be able to get a timestamp value without manually decoding it the Avro schema needs to annotate the field with a logical type.

hashhar avatar Jul 29 '22 06:07 hashhar

@hashhar You can read this commit https://github.com/trinodb/trino/pull/13070/commits/52f4cf28912f5e2d5e4933464a0a6a39a04e1bfa that I revert trino-kafka adapting Avro logical type.

Still I don't know what changes you proposed here. It follows Avro spec, schema factory can infer Trino's type from Avro type and logical type, but that's the responsibility of downstream plugins.

tisonkun avatar Jul 29 '22 06:07 tisonkun

I'm a bit confused about this now. I probably need to re-read existing code more thoroughly and re-review later. In meantime I'll defer to @Praveen2112 .

hashhar avatar Jul 29 '22 07:07 hashhar

ping @ebyhr @hashhar @Praveen2112

tisonkun avatar Aug 09 '22 09:08 tisonkun

@Praveen2112 thanks for your reivews! Updated and squashed.

tisonkun avatar Sep 01 '22 09:09 tisonkun

@Praveen2112

but ideally it should happen as a part of Avro library and it does support it.

You can directly send a patch onto this one for your suggestions, or at least point out what APIs provided by Avro are related to implementing it.

tisonkun avatar Sep 19 '22 13:09 tisonkun

:wave: @tisonkun @Praveen2112 @hashhar - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

mosabua avatar Jan 12 '24 22:01 mosabua

No longer work on this.

tisonkun avatar Feb 26 '24 11:02 tisonkun