jackson-dataformats-binary
jackson-dataformats-binary copied to clipboard
[avro] java.io.IOException: Invalid Union index (-40); union only has 2 types
Unsure if I'm doing something wrong here. I want to deserialize Avro to a Json string.
I've boiled my issue down to the following:
public static void main(String[] args) {
String inputFile = "test.avro";
MappingIterator<JsonNode> it = null;
try {
Schema jsonSchema =
new Schema.Parser().setValidate(true).parse(new File(inputFile + ".schema"));
AvroSchema schema = new AvroSchema(jsonSchema);
AvroMapper avroMapper = new AvroMapper();
avroMapper.schemaFrom(new File(inputFile + ".schema"));
it = avroMapper.readerFor(JsonNode.class).with(schema).readValues(new FileInputStream(inputFile));
} catch (IOException ex) {
System.err.println("Could not open " + inputFile + " : " + ex.getMessage());
System.exit(1);
}
while (it.hasNext()) {
JsonNode row = it.next();
System.out.println(row);
}
}
I get an exception:
Exception in thread "main" java.lang.RuntimeException: Invalid Union index (-40); union only has 2 types
at com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:196)
at test.AvroReadToJsonNode.main(AvroReadToJsonNode.java:33)
Caused by: java.io.IOException: Invalid Union index (-40); union only has 2 types
at com.fasterxml.jackson.dataformat.avro.deser.ScalarDecoder$ScalarUnionDecoder$FR._checkIndex(ScalarDecoder.java:422)
at com.fasterxml.jackson.dataformat.avro.deser.ScalarDecoder$ScalarUnionDecoder$FR.readValue(ScalarDecoder.java:412)
at com.fasterxml.jackson.dataformat.avro.deser.RecordReader$Std.nextToken(RecordReader.java:134)
at com.fasterxml.jackson.dataformat.avro.deser.AvroParserImpl.nextToken(AvroParserImpl.java:98)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:249)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:277)
at com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:192)
The schema looks like this:
{
"type" : "record",
"name" : "test",
"namespace" : "test.test.avro",
"doc" : "",
"fields" : [ {
"name" : "some_string",
"type" : [ "null", "string"]
} ]
}
And I generated data from the schema using avrotools:
avrotools random --schema-file test.avro.schema --count 100 test.avro
And this is with which Jackson version?
2.9.2
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-avro</artifactId>
<version>2.9.2</version>
</dependency>
Ok. So reproduction is almost complete, one missing piece being the encoded input file. I think that is needed as presumably module would not write such content.
I am guessing this might be due to one unfortunate design by Avro authors, however... format is different when stored in a file compared to when encoded for transmission. If so, it will start with a marker and schema as json. Given lack of any metadata in encoding, this is not possible to reliably auto-detect; and it seems strange to require codecs to be aware of input source. At the moment this module does not have special handling for this prefix, although I think there is an issue for requesting implementation.
It should be relatively easy to check if input might be of this form: Avro specification outlines how the headers looks like:
https://avro.apache.org/docs/1.8.2/spec.html#Object+Container+Files
I think this is one of badly designed bad of specification and wonder what authors were smoking. But it is what it is.
For the encoded input file, you can use avrotools random to generate some data. I used a command line like the following:
avrotools random --schema-file test.avro.schema --count 100 test.avro
Here's a link to a sample file: https://storage.googleapis.com/vincegonzalez/jackson-dataformats-binary-issue-123.avro
Yes, that does start with Obj
signature indicating Object Container addition, with signature followed by JSON-encoded embedded schema.
So as things are, Object Container files are not supported, only raw encoded content. Issue #8 is about adding support for handling this case (both reading and writing).