avro-util icon indicating copy to clipboard operation
avro-util copied to clipboard

[avro-fastserde] Cached fast (de)serializers are not updated after setting schema

Open maciejkowalczyk opened this issue 3 years ago • 0 comments

When writer schema is not known at the time of FastSpecificDatumReader creation, we pass null as writerSchema constructor parameter. Then, even after setting proper writer schema usingsetSchema(), we get an NPE during read():

java.lang.NullPointerException: Cannot invoke "org.apache.avro.Schema.equals(Object)" because "writer" is null
	at org.apache.avro.Schema.applyAliases(Schema.java:1832)
	at org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:131)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
	at com.linkedin.avro.fastserde.FastSerdeCache$FastDeserializerWithAvroSpecificImpl.deserialize(FastSerdeCache.java:543)
	at com.linkedin.avro.fastserde.FastGenericDatumReader.read(FastGenericDatumReader.java:89)
	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:251)
	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
	(...)

Reproduction test:

        final Schema writerSchema = Schema.create(Schema.Type.LONG);
        final DataFileWriter<Long> writer = new DataFileWriter<>(new FastGenericDatumWriter<>(writerSchema));
        final ByteArrayOutputStream byos = new ByteArrayOutputStream();
        writer.create(writerSchema, byos);
        writer.append(12345L);
        writer.close();

        final Schema readerSchema = Schema.create(Schema.Type.LONG);
        final FastGenericDatumReader<Long> datumReader = new FastGenericDatumReader<>(null, readerSchema);
        final DataFileReader<Long> reader = new DataFileReader<>(new SeekableByteArrayInput(byos.toByteArray()),
                //this updates datumReader.writerSchema based on metadata in the data file
                datumReader);
        reader.next();

maciejkowalczyk avatar Dec 06 '21 16:12 maciejkowalczyk