jackson-dataformats-binary
jackson-dataformats-binary copied to clipboard
Incorrect serialization For LogicalType.Decimal
Your following extension NonBSGenericDatumWriter
of GenericDatumWriter
ignores proper support for AVRO's LogicalType.Decimal
(not calling Conversions.DecimalConversion
) when dealing with BigDecimal
types.
A possible fix (inside NonBSGenericDatumWriter
) might be:
case BYTES:
if (datum instanceof byte[]) {
super.writeWithoutConversion(schema, ByteBuffer.wrap((byte[]) datum), out);
return;
}
// Try and convert Decimal types, if the logical type indicates it
LogicalType logicalType = schema.getLogicalType();
if (logicalType instanceof Decimal.LogicalType ) {
Conversion<BigDecimal> conversion = data.getConversionByClass(BigDecimal.class, logicalType);
byte[] decimal = convert(schema, logicalType, conversion, datum);
super.writeWithoutConversion(schema, ByteBuffer.wrap(decimal), out);
return;
}
break;
I'm willing to contribute a PR (including tests), but I'd like to apply it as a fix to 2.13.x branch.
@cowtowncoder Let me know what you think
I believe #221 refers to the the same issue
Sounds good -- fix against 2.13 would be appreciated.
@cowtowncoder While the write/(serialize) path was rather easy to fix, the read path seems a bit harder. serialization happens in a mostly Jackson generic fashion, unaware that the JsonParser
implementation providing the tokens is an Avro related one.
I can register cutom JsonDeserializer
for BigDecimal
, just like you did in for date/time types in #283 (maybe we can bind this to a Mapper Feature
), But I require schema information in order to set the scale properly.
In order to adhere to Avro Decimal serialization behavior described in Conversions.DecimalConversion
- the scale is not serialized, and it is expected to be available from the schema when trying to deserialize (the number is is saved as an unscaled BigInteger
)
Any ideas on how to get the correct location of the schema for that a deserializer?
10x
I would have thought that reading/deserialization should also work fine, as long as low-level decoder uses the schema information appropriately? Although I guess if this is not the case, then various workarounds may be needed.
@cowtowncoder Well, it's not. Any ideas on how to get the schema information there?
It seems like the correct approach would to make JacksonAvroParserImpl
and its counterpart avroContext (RecordReader$Std
) be able to return the correct schema reference/location. Any tips ideas on how to best perform this?
Schema is carried along when constructing readers/writers, so that'd be the place to store information. It has unfortunately been a while since I looked into this so I do not remember exact flow; but basically only information deemed necessary from schema is retained for readers/writers, not the schema. To ideally avoid various by-name lookups.
One possible way forward here would be a (unit) test that shows incorrect handling; possibly contrasting Avro-backed decoder (which would expect properly encoded/serialized value) to Jackson one (which works with whatever serialization is currently used) to show the issue?
I could probably find time to work on this in near future, but would need help reproducing the issue first (as well as then verifying the fix).