jackson-dataformats-binary icon indicating copy to clipboard operation
jackson-dataformats-binary copied to clipboard

[avro] Add support for reading schema from Avro-encoded file

Open cowtowncoder opened this issue 8 years ago • 9 comments

(moved from https://github.com/FasterXML/jackson-dataformat-avro/issues/10)

Avro streams may include embedded schema, and since it should be relatively safe to either auto-detect it; or just configure this to be the default if no schema is specified, we should support this mode.

As to sample data, maybe this project:

https://github.com/miguno/avro-cli-examples

has data we could use for confirming proper usage.

A follow-up feature should probably be that of producing & embedded schema; but that'd be a separate RFE.

cowtowncoder avatar May 06 '16 02:05 cowtowncoder

Is this ticket resolved? Noticed it's referenced in the chery-pick commit

bkenned4 avatar May 25 '17 17:05 bkenned4

@bkenned4 No; may have accidentally included issue id of the old repo.

As to implementation I suspect auto-detection may be slightly risky (it is possible to have encoded data start with same 4 bytes). But as long as it's format feature, disabled by default, may make sense. In addition to forcing use

cowtowncoder avatar May 25 '17 21:05 cowtowncoder

@cowtowncoder clear. thanks for the context

bkenned4 avatar May 25 '17 21:05 bkenned4

@cowtowncoder what's the status of this issue? This'd be 100% useful in a number of cases. For example, another part of my system is generating AVRO documents, and I know for sure that the schema is present, so at least a possibility of a manual schema detection would be nice!

iehrlich avatar Aug 21 '18 12:08 iehrlich

@cowtowncoder given the last comment was few years ago, I'm not sure where this issue stands, but it looks like it still open. I'm currently writing a spring boot application to consume multiple files in different formats (xml, csv, avro) and this would help a lot.with keeping code clean and easy to follow. Thank you

njawad25 avatar Jun 22 '22 23:06 njawad25

At this point I do not have time to work on this feature, even though I fully agree that this would be a great feature.

However: if someone has the itch and would like to try to produce a PR, I will find time to help getting PR refined and hopefully merged. At this point such contribution could make it to upcoming 2.14.0 and earn kudos for a really, really nice addition from happy users. :)

Also: one thing that can help motivate others is to "up vote" issue with "thumbs up" reaction. While that does not change anyone's availability, sometimes it can help prioritize things nonetheless.

cowtowncoder avatar Jun 24 '22 01:06 cowtowncoder

An additional idea: if you don't think you know how to tackle somewhat advanced feature like this one (it's not trivial to figure out where and how to plug it in if not familiar with the project, at least), one thing that would be helpful is simply a unit test: writing test that tries to read input file that contains embedded Schema, using default parser/mapper with no extra settings -- and would currently fail.

Feature implementor would only need to modify test lightly to enable schema-reading (I think a AvroParser.Feature is needed since due zero-redundancy it is not possible 100% reliably detect that content starts with Schema, I think) but could use it as verification of feature functioning.

cowtowncoder avatar Jun 24 '22 01:06 cowtowncoder