jackson-dataformats-binary icon indicating copy to clipboard operation
jackson-dataformats-binary copied to clipboard

Backward compatibility support for Avro schema when deserialize data

Open chioai1309 opened this issue 3 years ago • 1 comments

My project currently using jackson-dataformat-avro (version 2.12.2) to convert the Java POJO and store it. Just facing problem is that when the schema is evolve then the old data stored cannot be deserialize back with the following exception:

com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input in FIELD_NAME
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:659)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:636)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl._nextByteGuaranteed2(JacksonAvroParserImpl.java:1038)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl._nextByteGuaranteed(JacksonAvroParserImpl.java:1033)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl._decodeIntSlow(JacksonAvroParserImpl.java:265)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl.decodeInt(JacksonAvroParserImpl.java:234)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl.decodeIndex(JacksonAvroParserImpl.java:988)
	at com.fasterxml.jackson.dataformat.avro.deser.ScalarDecoder$ScalarUnionDecoder$FR.readValue(ScalarDecoder.java:412)
	at com.fasterxml.jackson.dataformat.avro.deser.RecordReader$Std.nextToken(RecordReader.java:142)
	at com.fasterxml.jackson.dataformat.avro.deser.AvroParserImpl.nextToken(AvroParserImpl.java:97)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:156)
	at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2079)
	at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1453)

Reason is that RecordReader$Std try to resolve the token for the new added field while the data reader reach till the end of the stored message.

Given at the first version I have this POJO and corresponding Avro schema generated for it:

@Getter
@Setter
@Document(StoreEntity.COLLECTION_NAME)
public class StoreEntity extends AuditableEntity {

  public static final String COLLECTION_NAME = "StoreEntity";

  @Id
  private String id;
  @Field
  @Length(max = 150)
  @NotBlank
  private String name;
  @Field
  @Indexed(unique = true)
  @Length(max = 50)
  @NotBlank
  private String code;
  @Field
  @CountryCode
  private String countryCode;
}
===================================
{
   "type":"record",
   "name":"StoredEntity",
   "namespace":"com.mydomain.entity",
   "fields":[
      { "name":"code", "type":["null","string"] },
      { "name":"countryCode", "type":["null","string"] },
      { "name":"createdBy", "type":["null","string"] },
      { "name":"createdDate", "type":["null","string"] },
      { "name":"id", "type":["null","string"] },
      { "name":"lastModifiedBy", "type":["null","string"] },
      { "name":"lastModifiedDate", "type":["null","string"] },
      { "name":"name", "type":["null","string"] }
   ]
}

Later on the schema is evolved with the new field append to the end of the schema

@Getter
@Setter
@Document(StoreEntity.COLLECTION_NAME)
public class StoreEntity extends AuditableEntity {

  public static final String COLLECTION_NAME = "StoreEntity";

  @Id
  private String id;
  @Field
  @Length(max = 150)
  @NotBlank
  private String name;
  @Field
  @Indexed(unique = true)
  @Length(max = 50)
  @NotBlank
  private String code;
  @Field
  @CountryCode
  private String countryCode;
  @Field
  @JsonProperty(defaultValue = "null")
  private String phone;
}
====================================================
{
   "type":"record",
   "name":"StoredEntity",
   "namespace":"com.mydomain.entity",
   "fields":[
      { "name":"code", "type":["null","string"] },
      { "name":"countryCode", "type":["null","string"] },
      { "name":"createdBy", "type":["null","string"] },
      { "name":"createdDate", "type":["null","string"] },
      { "name":"id", "type":["null","string"] },
      { "name":"lastModifiedBy", "type":["null","string"] },
      { "name":"lastModifiedDate", "type":["null","string"] },
      { "name":"name", "type":["null","string"] }
      { "name":"phone", "type":["null","string"], "default":null }
   ]
}

By following some convention of Avro schema Resolution mentioned here http://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution

if the reader's record schema has a field that contains a default value, and writer's schema does not have a field with the same name, then the reader should use the default value from its field.

Also another source of suggestion here https://docs.confluent.io/2.0.0/avro.html#backward-compatibility

But seem this is not the case with the library.

chioai1309 avatar Apr 08 '21 11:04 chioai1309

Jackson's handling of default values may be incomplete wrt Avro definitions, but the exception you get seems bit different and it would be good to reproduce that from minimal reproduction. This needs to include specific code used to read and write content, not just java type / avro schema definitions. Part of this is to ensure that use of reader/writer schemas (wrt schema evolution) is correct.

So, if it was possible to reduce it, including elimination of use of frameworks like Lombok (just for testing as our tests cannot add it as a dependency -- use with Jackson is fine in itself), it'd be possible to have a look at what is causing the problem.

Jackson does support schema evolution in and of itself, but as you probably know it is necessary to separate specify reader and writer schemas: "writer schema" being the schema that was used for writing, and "reader schema" the new one application wants to use. It is never possible to just use a new schema in isolation since Avro does not include enough metadata for decoder to handle changes, even compatible ones.

cowtowncoder avatar Apr 08 '21 18:04 cowtowncoder