avro
avro copied to clipboard
AVRO-3408: Schema evolution with logical types
Hello! Please take a look a PR for the issue https://issues.apache.org/jira/browse/AVRO-3408
I hope I fulfilled all PR requirements but feel free to point me out if I missed something.
Thank you in advance. I would love the read what does the community think about it.
@opwvhk what is the rule for PR updates? Shall I amend my original commit and force push to a branch to keep the number of commits in the PR = 1 or can I create another commit with a fix and we will squash all together during a merge?
To update your PR, simply push a new commit to the branch. This will update the PR.
hi! Are there any other concerns for this PR?
hi! Are there any other concerns for this PR?
Only one: converting from a CharSequence doesn't yet take the scale into account. It should, because otherwise reading data can result in a BigDecimal with an unexpected scale. Simply setting the scale with RoundingMode.UNNECESSARY
is sufficient; evolution to a larger will then still work, but evolution to a smaller scale will only succeed if the value allows it (as it should).
I beg your very pardon for my ignorance, but @RyanSkraba and @rstata are 2 different people, right?
@RyanSkraba, thank you for your self-request, I look forward to your opinion on this PR :pray:
@rstata you were the last person who touched org.apache.avro.io.parsing.ResolvingGrammarGenerator
. I've got a question for you, in my PR we create a new instance of GenericDatumReader
on every conversion, it works fine but it's might be inefficient.
I thought to create such a reader in ResolvingAction
or even create a new Action
and delegate all that conversation business to the action. But for that, we need a reference of org.apache.avro.generic.GenericData
here org.apache.avro.io.ResolvingDecoder#resolve
. I made an attempt and it could be done with no harm to other code, but I didn't dare to offer such code without a discussion.
Long story short, my suggestion is to add a GenericData
parameter to
org.apache.avro.io.DecoderFactory#resolvingDecoder
and sink down it to org.apache.avro.io.ResolvingDecoder
constructor, methodorg.apache.avro.io.ResolvingDecoder#resolve
and at the end make a field in org.apache.avro.io.parsing.ResolvingGrammarGenerator
in order to use it in
org.apache.avro.io.parsing.ResolvingGrammarGenerator#generate(org.apache.avro.Resolver.Action, java.util.Map<java.lang.Object,org.apache.avro.io.parsing.Symbol>)
at this moment:
if (action instanceof Resolver.Promote) {
return Symbol.resolve(action.writer, action.reader, simpleGen(action.writer, seen),
simpleGen(action.reader, seen));
(presumably in Symbol.resolve
method)
Thank you for your time.
thank you @opwvhk for a very good point! I add a this functionality + tests, humbly ask to take a look.
We are two different people :D
Again, thanks for your attention on this, there's some discussion on the JIRA about "the right thing to do". I definitely think you have a valid use case and that it should be simple for users of the Java SDK to migrate away from the pre-LogicalType java-class
and @Stringable
annotations!
I'm not entirely sure: maybe interpreting an underlying STRING as character digits for a decimal LogicalType is the right thing to do! It feels like something we should discourage, but if it would fix this migration issue it's hard to argue against it!
Cool, I'm glad to see this PR is not stalled and more people are involved. :+1:
Could anyone answer my previous question about ResolvingDecoder#resolve
?
I think we face here 2 problems:
- the first is BigDecimal by default is decoded as a string, but the decimal logical type does not support string conversion, thus we have inconsistency.
- the second is some kind of new way how to make schema migration if you just need to change the encoding type. With my changes, you can introduce a new logical type for a field that would support an old format and a new format. At the method
ResolvingDecoder#resolve
I told above, we can check if the type conversion is valid or not, or we can do the same during theResolvingAction
. I think it's a sort of debate like runtime vs compile-time resolution. That's why I'd like to hear community comments on this.
Close/re-open to trigger the GHA checks.
hi! does anyone knows why did the build fail? It used to be green... And good news everyone, I tested this patch on our production, it works and we successfully migrated from one version to another one with it. And it works for several month on our prod (even at this moment) So, from bug point of view, it's kinda safe to merge.
Close/reopen to check! :roll_eyes: I'm not sure why the "rerun jobs" button disappeared!
Note that the 1.11.1 RC is out, after that release, we should be WAY more reactive to attack all of these contributions!
Any news? Is this PR still alive?
Hi @mrk-andreev ! Thank you for poking this PR, yes the pr is okay, and per my understanding it's waiting for a small spec change. It's still unclear to me though how this process look like. Where is the spec source text and who is the person(s) who can push the button. if someone can explain, I'm ready to make a change and communicate with the folks.
Hey, @RyanSkraba ! I hope you're having wonderful holidays
I'm marking this as Request changes to make sure it doesn't get merged before consensus on the mailing list. Spec changes affect all the SDKs.
could you elaborate how to make changes to the spec? Shall I write a letter or communicate to someone? Thank you. Marry Christmas and happy New Year!
Hello? Don't want to whine out but it feels like you turn a blind eye to this issue.... if someone could help me with my questions above it would be marvelous!
The specification sources can be found at https://github.com/apache/avro/blob/master/doc/content/en/docs/%2B%2Bversion%2B%2B/Specification/_index.md