jackson-dataformats-binary icon indicating copy to clipboard operation
jackson-dataformats-binary copied to clipboard

Avro logical type support

Open jcustenborder opened this issue 6 years ago • 13 comments

I'm interested in helping out on this one. Are there any pointers you could give me getting started? I'm thinking about adding support for these logical types.

#25

jcustenborder avatar May 07 '18 18:05 jcustenborder

Great!

I wish I had good pointers to give, but I don't really have much. Perhaps one starting point could be to first document here (or separate issue) of how these are represented in Avro: physical encoding, and then id(s) used for logical type. Jackson module has its representation of Avro schema, too, and the next step could be to import logical type information in some form. Although it is open ended in Avro, Jackson side could/should probably use enums to add support on case-by-case basis, once there is clear idea of how to support specific types.

When metadata exists, I think decoders would then be able to handle mapping so that in cases where native Avro type would not work, decoder/parser can change how it is expressed at token level. Key challenge being that types of JsonTokens available is limited, and these are the only things that databinding will see (*).

(*) actually, for JsonToken.VALUE_EMBEDDED_OBJECT, there's fallback of... well, anything may be exposed, and this is how Base64 encoded binary is (for example) supported.

But with that, there are 2 things that can be done:

  1. Sometimes instead of exposing physical Avro type as is -- say, binary for byte[] -- would not work (since databind deserializer does not know how to use it), but one of other tokens would be. Just as hypothetical example, let's say there was IPv6 address (16 bytes?) , which by default would not be handled... but if we'd convert to String, expose as JsonToken.VALUE_STRING, it would map.
  2. Sometimes simply exposing as VALUE_EMBEDDED_OBJECT BUT also making embedded object actual Java level object (like java.util.Calendar) would work -- may just need minor work in jackson-databind to make sure this gets tunnelled through

Come to think of it, sometimes additions as jackson-databind could also work. In above example, there's no reason why IPv6 value could not be read both from String and 16-byte binary value: this is in fact what is done with java.util.UUID, to allow for compact representation for UUIDs in binary formats.

Does this help? I would be excited to get improved support here.

cowtowncoder avatar May 07 '18 21:05 cowtowncoder

This is helpful.

I'm really familiar on the Avro side. The Jackson side is where I could use help navigating. BTW thanks for this project. It's one of the under appreciated gems of open source.

jcustenborder avatar May 07 '18 21:05 jcustenborder

@jcustenborder ah. Excellent! I haven't been following up things that much on Avro side, so while I was aware of logical types, I don't know how they are actually used. But I do know Jackson side of mappings quite well.

I am also excited about making this format module more useful, and extending support to logical types seems like a good thing. Your expertise should be very useful with that.

One caveat wrt current state of things: master is now for 3.0 development, which is a big change from 2.x in some respect. But it will take quite a while to get completed (late 2018 likely first release candidate). Avro module itself doesn't change that much, except for construction of AvroFactory and AvroMapper, which will be builder based. So: I think that it will probably make sense to work on 2.9 branch as changes should merge cleanly to master; I can help if there are any issues. This way if handling can be improved users get it much sooner than having to wait for 3.0.0

cowtowncoder avatar May 07 '18 21:05 cowtowncoder

@cowtowncoder I believe you meant to tag @jcustenborder

JacekLach avatar May 07 '18 21:05 JacekLach

@JacekLach lol. Thanks. Yes. Stupid github auto-completion.

cowtowncoder avatar May 07 '18 21:05 cowtowncoder

@cowtowncoder Is there any reason NonBSGenericDatumWriter is public? I think I'm going to need to change it's constructor. Should I implement another class with LogicalType support or should I try to use this one?

jcustenborder avatar May 09 '18 15:05 jcustenborder

@jcustenborder It is not meant to be used by non-core package itself, so the only reason would be if it needs to be accessed from within another package. If you need to change for patch version (which I think is the idea here), please leave old one deprecated for 2.9, to be removed from 3.0 (master).

I think using this one is fine; ideally we wouldn't need it, so if it turns out newer versions of avro stdlib don't require it, all the better. But you do not need to create separate one just wrt backwards compatibility.

cowtowncoder avatar May 10 '18 15:05 cowtowncoder

Hi @cowtowncoder , What is the current state of logicalType support with Jackson ?

I tried to do this :

A sample POJO :

public class TestDTO {

	@JsonProperty(required = true)
	@AvroSchema(value = "{\"type\":\"bytes\",\"logicalType\":\"decimal\",\"precision\":10,\"scale\":3}")
	private BigDecimal value;

        // + POJO boiler plate

}

The generated schema with Jackson :

{
  "type" : "record",
  "name" : "TestDTO",
  "namespace" : "com.example",
  "fields" : [ {
    "name" : "value",
    "type" : {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 10,
      "scale" : 3
    }
  } ]
}

But writing a value with Jackson fails :

AvroMapper mapper = new AvroMapper(new AvroFactory());
AvroSchemaGenerator gen = new AvroSchemaGenerator();
mapper.acceptJsonFormatVisitor(TestDTO.class, gen);
AvroSchema schema = gen.getGeneratedSchema();

TestDTO dto = new TestDTO();
dto.setValue(BigDecimal.valueOf(18905, 2));
mapper.writer(schema).writeValueAsBytes(dto);

Error :

com.fasterxml.jackson.databind.JsonMappingException: class java.math.BigDecimal cannot be cast to class java.nio.ByteBuffer (java.math.BigDecimal and java.nio.ByteBuffer are in module java.base of loader 'bootstrap')

	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._wrapAsIOE(DefaultSerializerProvider.java:509)
	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:482)
	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319)
	at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1433)
	at com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1135)
	at com.fasterxml.jackson.databind.ObjectWriter.writeValueAsBytes(ObjectWriter.java:1029)
	at com.example.SomeTest.testDeviceDTOSchema(ProcessorAvroSchemasTest.java:51)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:675)
	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:125)
	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:132)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:124)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:74)
	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:104)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:62)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:43)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:35)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:202)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:198)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:135)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:229)
	at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:197)
	at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:211)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:191)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
	at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:69)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: java.lang.ClassCastException: class java.math.BigDecimal cannot be cast to class java.nio.ByteBuffer (java.math.BigDecimal and java.nio.ByteBuffer are in module java.base of loader 'bootstrap')
	at org.apache.avro.generic.GenericDatumWriter.writeBytes(GenericDatumWriter.java:273)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:129)
	at com.fasterxml.jackson.dataformat.avro.ser.NonBSGenericDatumWriter.write(NonBSGenericDatumWriter.java:101)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
	at com.fasterxml.jackson.dataformat.avro.ser.NonBSGenericDatumWriter.write(NonBSGenericDatumWriter.java:101)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
	at com.fasterxml.jackson.dataformat.avro.ser.RootContext.complete(RootContext.java:122)
	at com.fasterxml.jackson.dataformat.avro.AvroGenerator._complete(AvroGenerator.java:628)
	at com.fasterxml.jackson.dataformat.avro.AvroGenerator.writeEndObject(AvroGenerator.java:417)
	at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:168)
	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
	... 68 more

After reading issues :

  • https://github.com/FasterXML/jackson-dataformats-binary/issues/132
  • https://github.com/FasterXML/jackson-dataformats-binary/issues/25

I don't understand if you support this kind of schemas or not ? Am I doing something wrong ?

idkw avatar Apr 22 '20 14:04 idkw

Is this with version 2.10.3? There is also one change to behavior of schema generation in 2.11, so 2.11.0.rc1 (2.11.0 will be released very soon) might have differing handling.

cowtowncoder avatar Apr 22 '20 17:04 cowtowncoder

2.10.3 yes

idkw avatar Apr 23 '20 11:04 idkw

Hi there :)!

I there still an interest to add logicalType support? Maybe not full-fledged solution but at least for easy to solve types?

What about extending IntegerVisitor to something similar like this:

public class IntegerVisitor extends JsonIntegerFormatVisitor.Base
            implements SchemaBuilder {

    ......

    @Override
    public Schema builtAvroSchema() {
        if (_type == null) {
            throw new IllegalStateException("No number type indicated");
        }

        Schema schema = AvroSchemaHelper.numericAvroSchema(_type);
        if (_hint != null) {
            String logicalType = logicalType(_hint);
            if (logicalType != null) {
                schema.addProp(LOGICAL_TYPE, logicalType);
            } else {
                schema.addProp(AvroSchemaHelper.AVRO_SCHEMA_PROP_CLASS, AvroSchemaHelper.getTypeId(_hint));
            }
        }
        return schema;
    }

    private String logicalType(JavaType hint) {
        Class<?> clazz = hint.getRawClass();

        if (Date.class.isAssignableFrom(clazz)) {
            return "timestamp-millis";
        }
        if (OffsetDateTime.class.isAssignableFrom(clazz)) {
            return "timestamp-millis";
        }
        if (ZonedDateTime.class.isAssignableFrom(clazz)) {
            return "timestamp-millis";
        }
        if (LocalDate.class.isAssignableFrom(clazz)) {
            return "date";
        }
        ...... for all other types
		return null;
    }

I use this solution with JavaTimeModule and it works quite well for date and time related types.

MichalFoksa avatar Apr 27 '21 12:04 MichalFoksa

@MichalFoksa sounds like possible nice incremental improvement? Could you please file a separate issue for this specific aspect as there are multiple things that could be improved, and schema generation is one of areas. This issue can remain open for the general issue of improved logical type support.

cowtowncoder avatar Apr 28 '21 00:04 cowtowncoder

Ok, so, #283 added support for some of java.time types. Looks like support for BigDecimal could be added similarly, but with separate PR: perhaps generation of logicalType going in AvroMapper proper same as with date/time. But ser/deser might not even need separate module but also be added directly via mapper (assuming that the new representation is not ambiguous).

cowtowncoder avatar Jun 29 '21 21:06 cowtowncoder