avro-util
avro-util copied to clipboard
[avro-fastserde] Cached fast (de)serializers are not updated after setting schema
When writer schema is not known at the time of FastSpecificDatumReader
creation, we pass null
as writerSchema
constructor parameter.
Then, even after setting proper writer schema usingsetSchema()
, we get an NPE during read()
:
java.lang.NullPointerException: Cannot invoke "org.apache.avro.Schema.equals(Object)" because "writer" is null
at org.apache.avro.Schema.applyAliases(Schema.java:1832)
at org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:131)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at com.linkedin.avro.fastserde.FastSerdeCache$FastDeserializerWithAvroSpecificImpl.deserialize(FastSerdeCache.java:543)
at com.linkedin.avro.fastserde.FastGenericDatumReader.read(FastGenericDatumReader.java:89)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:251)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
(...)
Reproduction test:
final Schema writerSchema = Schema.create(Schema.Type.LONG);
final DataFileWriter<Long> writer = new DataFileWriter<>(new FastGenericDatumWriter<>(writerSchema));
final ByteArrayOutputStream byos = new ByteArrayOutputStream();
writer.create(writerSchema, byos);
writer.append(12345L);
writer.close();
final Schema readerSchema = Schema.create(Schema.Type.LONG);
final FastGenericDatumReader<Long> datumReader = new FastGenericDatumReader<>(null, readerSchema);
final DataFileReader<Long> reader = new DataFileReader<>(new SeekableByteArrayInput(byos.toByteArray()),
//this updates datumReader.writerSchema based on metadata in the data file
datumReader);
reader.next();