vulcan icon indicating copy to clipboard operation
vulcan copied to clipboard

Proposed roadmap for Vulcan 2.0

Open bplommer opened this issue 3 years ago • 2 comments

Summary

Here are some thoughts on how to decouple the Vulcan API from the Java Avro SDK (JAvro), opening the way to adding an alternative backend that implements Avro directly. Previously we discussed doing this by introducing our own representation of encoded avro values, so that Vulcan would convert between these and user types and backends would convert between these and Avro wire formats. Instead, I want to suggest that we convert the codecs into an algebraic datatype that can traversed by a separate interpreter to convert directly between user types and an arbitrary backend representation.

This has a few advantages:

  • It avoids adding an extra layer of indirection at runtime.
  • Most of the work can be done incrementally as non-breaking changes in the 1.x series, as the implementation of Codec is invisible to users (whereas the representation of Avro values isn't.)
  • It reduces API surface area - we can keep the details of the Codec ADT package-private, whereas it's not clear we'd be able to do the same for a model of Avro values.

Roadmap

Changes in 1.x

  • [x] Per https://github.com/fd4s/vulcan/pull/435, deprecate Codec.instance (which is coupled directly to the JAvro API) and replace most uses of it with a few primitives and combinators.
  • [ ] Convert codecs to a fully introspectable algebraic datatype. Following the example of UnionCodec in https://github.com/fd4s/vulcan/pull/435, convert all primitive codecs and combinators into named subtypes.
  • [ ] Refactor implementations of primitives and combinators into an interpreter of the newly introduced ADT. encode , decode and schema now delegate to the interpreter.
  • [ ] Deprecate encode, decode, and schema methods on Codec, in favour if explicit use of the interpreter, to prepare for fully decoupling Codec from JAvro.

Changes in 2.0

  • [ ] Remove Codec.instance - all codecs must be derived from primitives we provide.
  • [ ] Move methods for serialization and deserialization from Codec to live with JAvro-based interpreter.
  • [ ] Remove encode, decode and schema methods on Codec
  • [ ] Consider exposing an alternative representation of schemas directly on Codec, either as a raw json string or as our own structured represenation of schemas
  • [ ] AvroTypes types are no longer aliases for JAvro representations - instead they are either phantom types or our own model of Avro schemas
  • [ ] Codec API is fully decoupled from JAvro
  • [ ] Separate the JAvro-based interpreter into a new module, remove JAvro dependency from core module

Hopefully these changes won't impact most users too much, given that the most common use case is via the integration with fs2-kafka.

Any feedback would be much appreciated!

bplommer avatar Apr 05 '22 13:04 bplommer

@vlovgr any thoughts or concerns on this before i move forward with it further?

bplommer avatar May 12 '22 07:05 bplommer

@bplommer It sounds like a solid and exciting plan. 👍 🎉

vlovgr avatar May 12 '22 09:05 vlovgr