jackson-dataformats-binary icon indicating copy to clipboard operation
jackson-dataformats-binary copied to clipboard

Incorrect serialization For LogicalType.Decimal

Open sheinbergon opened this issue 2 years ago • 7 comments

Your following extension NonBSGenericDatumWriter of GenericDatumWriter ignores proper support for AVRO's LogicalType.Decimal (not calling Conversions.DecimalConversion) when dealing with BigDecimal types.

A possible fix (inside NonBSGenericDatumWriter) might be:

        case BYTES:
            if (datum instanceof byte[]) {
                super.writeWithoutConversion(schema, ByteBuffer.wrap((byte[]) datum), out);
                return;
            }
            // Try and convert Decimal types, if the logical type indicates it
            LogicalType logicalType = schema.getLogicalType(); 
            if (logicalType instanceof Decimal.LogicalType  ) {
                Conversion<BigDecimal> conversion = data.getConversionByClass(BigDecimal.class, logicalType);
                byte[] decimal = convert(schema, logicalType, conversion, datum);
                super.writeWithoutConversion(schema, ByteBuffer.wrap(decimal), out);
                return;
            }
            break;

I'm willing to contribute a PR (including tests), but I'd like to apply it as a fix to 2.13.x branch.

@cowtowncoder Let me know what you think

sheinbergon avatar Dec 30 '21 15:12 sheinbergon

I believe #221 refers to the the same issue

sheinbergon avatar Dec 30 '21 15:12 sheinbergon

Sounds good -- fix against 2.13 would be appreciated.

cowtowncoder avatar Dec 30 '21 21:12 cowtowncoder

@cowtowncoder While the write/(serialize) path was rather easy to fix, the read path seems a bit harder. serialization happens in a mostly Jackson generic fashion, unaware that the JsonParser implementation providing the tokens is an Avro related one.

I can register cutom JsonDeserializer for BigDecimal, just like you did in for date/time types in #283 (maybe we can bind this to a Mapper Feature), But I require schema information in order to set the scale properly.

In order to adhere to Avro Decimal serialization behavior described in Conversions.DecimalConversion - the scale is not serialized, and it is expected to be available from the schema when trying to deserialize (the number is is saved as an unscaled BigInteger)

Any ideas on how to get the correct location of the schema for that a deserializer?

10x

sheinbergon avatar Jan 01 '22 21:01 sheinbergon

I would have thought that reading/deserialization should also work fine, as long as low-level decoder uses the schema information appropriately? Although I guess if this is not the case, then various workarounds may be needed.

cowtowncoder avatar Jan 03 '22 18:01 cowtowncoder

@cowtowncoder Well, it's not. Any ideas on how to get the schema information there?

It seems like the correct approach would to make JacksonAvroParserImpl and its counterpart avroContext (RecordReader$Std) be able to return the correct schema reference/location. Any tips ideas on how to best perform this?

sheinbergon avatar Jan 06 '22 10:01 sheinbergon

Schema is carried along when constructing readers/writers, so that'd be the place to store information. It has unfortunately been a while since I looked into this so I do not remember exact flow; but basically only information deemed necessary from schema is retained for readers/writers, not the schema. To ideally avoid various by-name lookups.

cowtowncoder avatar Jan 06 '22 19:01 cowtowncoder

One possible way forward here would be a (unit) test that shows incorrect handling; possibly contrasting Avro-backed decoder (which would expect properly encoded/serialized value) to Jackson one (which works with whatever serialization is currently used) to show the issue?

I could probably find time to work on this in near future, but would need help reproducing the issue first (as well as then verifying the fix).

cowtowncoder avatar May 16 '22 00:05 cowtowncoder