avro icon indicating copy to clipboard operation
avro copied to clipboard

AVRO-3408: Schema evolution with logical types

Open izemlyanskiy opened this issue 2 years ago • 16 comments

Hello! Please take a look a PR for the issue https://issues.apache.org/jira/browse/AVRO-3408

I hope I fulfilled all PR requirements but feel free to point me out if I missed something.

Thank you in advance. I would love the read what does the community think about it.

izemlyanskiy avatar Mar 07 '22 22:03 izemlyanskiy

@opwvhk what is the rule for PR updates? Shall I amend my original commit and force push to a branch to keep the number of commits in the PR = 1 or can I create another commit with a fix and we will squash all together during a merge?

izemlyanskiy avatar Mar 11 '22 17:03 izemlyanskiy

To update your PR, simply push a new commit to the branch. This will update the PR.

opwvhk avatar Mar 11 '22 18:03 opwvhk

hi! Are there any other concerns for this PR?

izemlyanskiy avatar Mar 16 '22 22:03 izemlyanskiy

hi! Are there any other concerns for this PR?

Only one: converting from a CharSequence doesn't yet take the scale into account. It should, because otherwise reading data can result in a BigDecimal with an unexpected scale. Simply setting the scale with RoundingMode.UNNECESSARY is sufficient; evolution to a larger will then still work, but evolution to a smaller scale will only succeed if the value allows it (as it should).

opwvhk avatar Mar 17 '22 17:03 opwvhk

I beg your very pardon for my ignorance, but @RyanSkraba and @rstata are 2 different people, right? @RyanSkraba, thank you for your self-request, I look forward to your opinion on this PR :pray: @rstata you were the last person who touched org.apache.avro.io.parsing.ResolvingGrammarGenerator. I've got a question for you, in my PR we create a new instance of GenericDatumReader on every conversion, it works fine but it's might be inefficient. I thought to create such a reader in ResolvingAction or even create a new Action and delegate all that conversation business to the action. But for that, we need a reference of org.apache.avro.generic.GenericData here org.apache.avro.io.ResolvingDecoder#resolve. I made an attempt and it could be done with no harm to other code, but I didn't dare to offer such code without a discussion. Long story short, my suggestion is to add a GenericData parameter to org.apache.avro.io.DecoderFactory#resolvingDecoder and sink down it to org.apache.avro.io.ResolvingDecoder constructor, methodorg.apache.avro.io.ResolvingDecoder#resolve and at the end make a field in org.apache.avro.io.parsing.ResolvingGrammarGenerator in order to use it in org.apache.avro.io.parsing.ResolvingGrammarGenerator#generate(org.apache.avro.Resolver.Action, java.util.Map<java.lang.Object,org.apache.avro.io.parsing.Symbol>) at this moment:

    if (action instanceof Resolver.Promote) {
      return Symbol.resolve(action.writer, action.reader, simpleGen(action.writer, seen),
          simpleGen(action.reader, seen));

(presumably in Symbol.resolve method)

Thank you for your time.

izemlyanskiy avatar Mar 21 '22 13:03 izemlyanskiy

thank you @opwvhk for a very good point! I add a this functionality + tests, humbly ask to take a look.

izemlyanskiy avatar Mar 21 '22 14:03 izemlyanskiy

We are two different people :D

Again, thanks for your attention on this, there's some discussion on the JIRA about "the right thing to do". I definitely think you have a valid use case and that it should be simple for users of the Java SDK to migrate away from the pre-LogicalType java-class and @Stringable annotations!

I'm not entirely sure: maybe interpreting an underlying STRING as character digits for a decimal LogicalType is the right thing to do! It feels like something we should discourage, but if it would fix this migration issue it's hard to argue against it!

RyanSkraba avatar Mar 30 '22 08:03 RyanSkraba

Cool, I'm glad to see this PR is not stalled and more people are involved. :+1: Could anyone answer my previous question about ResolvingDecoder#resolve?

I think we face here 2 problems:

  • the first is BigDecimal by default is decoded as a string, but the decimal logical type does not support string conversion, thus we have inconsistency.
  • the second is some kind of new way how to make schema migration if you just need to change the encoding type. With my changes, you can introduce a new logical type for a field that would support an old format and a new format. At the method ResolvingDecoder#resolve I told above, we can check if the type conversion is valid or not, or we can do the same during the ResolvingAction. I think it's a sort of debate like runtime vs compile-time resolution. That's why I'd like to hear community comments on this.

izemlyanskiy avatar Mar 30 '22 21:03 izemlyanskiy

Close/re-open to trigger the GHA checks.

martin-g avatar May 16 '22 06:05 martin-g

hi! does anyone knows why did the build fail? It used to be green... And good news everyone, I tested this patch on our production, it works and we successfully migrated from one version to another one with it. And it works for several month on our prod (even at this moment) So, from bug point of view, it's kinda safe to merge.

izemlyanskiy avatar Jul 28 '22 21:07 izemlyanskiy

Close/reopen to check! :roll_eyes: I'm not sure why the "rerun jobs" button disappeared!

Note that the 1.11.1 RC is out, after that release, we should be WAY more reactive to attack all of these contributions!

RyanSkraba avatar Jul 29 '22 13:07 RyanSkraba

Any news? Is this PR still alive?

mrk-andreev avatar Dec 28 '22 15:12 mrk-andreev

Hi @mrk-andreev ! Thank you for poking this PR, yes the pr is okay, and per my understanding it's waiting for a small spec change. It's still unclear to me though how this process look like. Where is the spec source text and who is the person(s) who can push the button. if someone can explain, I'm ready to make a change and communicate with the folks.

izemlyanskiy avatar Dec 28 '22 18:12 izemlyanskiy

Hey, @RyanSkraba ! I hope you're having wonderful holidays

I'm marking this as Request changes to make sure it doesn't get merged before consensus on the mailing list. Spec changes affect all the SDKs.

could you elaborate how to make changes to the spec? Shall I write a letter or communicate to someone? Thank you. Marry Christmas and happy New Year!

izemlyanskiy avatar Dec 29 '22 21:12 izemlyanskiy

Hello? Don't want to whine out but it feels like you turn a blind eye to this issue.... if someone could help me with my questions above it would be marvelous!

izemlyanskiy avatar Feb 16 '23 00:02 izemlyanskiy

The specification sources can be found at https://github.com/apache/avro/blob/master/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification/_index.md

martin-g avatar Feb 16 '23 08:02 martin-g