tapir icon indicating copy to clipboard operation
tapir copied to clipboard

Module for JSON codec derivation

Open adamw opened this issue 1 year ago • 7 comments

Currently, to create a json body input/output, both a Schema and json-library-specific encoders/decoders are needed. This means, that generic derivation is typically done twice (once for json encoders/decoders, once for schemas). Moreover, any customisations as to the naming strategy etc. need to be duplicated, often using different APIs, both for the json library and for the schemas.

It would be great to do the configuration and derivation once - but to do that, we would need to provide a module which would provide joint json encoder/decoder + tapir schema derivation. In other words, we would need to write code which derives a JsonCodec[T] (this includes the encode, decode and schema).

Doing this for all json libraries would be highly impractical, and a ton of work, for which we don't have resources. That's why I'd like to approach this using the json library that will be included in the Scala toolkit - that is, uPickle. uPickle can use a better derivation mechanism anyway (as our blogs have described), so it might be an additional win for our users.

Such a derivation would have to be written using a macro - and as we know, these are different in Scala 2/3. I think we should target Scala 3.

So summing up, the goal of the new module is to:

  • deliver a macro implementing generic derivation for a JsonCodec[T] for supported T types
  • the json implementation used should be uPickle
  • we are targeting Scala 3

While it might seem that the derivation could be implemented using Magnolia, I think writing a dedicated macro, which could utilize Scala 3's Mirrors, would actually be better. First, we would directly generate the code, instead of generating an intermediate representation, which is only converted to the final codec at run-time. That's a small performance win. But furthermore, we can provide better, contextual, error reporting. And excellent errors is something I'd like to be a priority for this task. I've done some experiments with deriving Schema using a macro directly here, but the work there has unfortunately stalled.

As for configuring the derivation, we should take into account the following:

  • customisations specified using Schema.annotatations on a perf-field/per-type basis - e.g. @encodedName should influence both the schema, and the generated json enoder/decoder
  • global customisations as specified in Configuration (global field name transformers etc.)
  • more options, than there are currently available through Configuration, to configure inheritance hierarchy serialization. This should include:
    • deserialisation using a discriminator field (partially available now) - with a value given with an annotation, or defaulting to the type's name
    • deserialisation using a single-field product (see Schema.oneOfWrapped)
    • deserialisation using a "first-successful" strategy
    • overriding the inheritance configuration locally using an annotation
    • maybe some more - to research what's available in other libraries
  • various options to serialise enumerations: as a string representation, as a result of function application, as an ordinal
  • adding annotations externally, e.g. through a list (class field, annotation value) pairs

In the end, the user should get an alternative to the current import sttp.tapir.json.upickle.* + optional imports for auto-deriving uPickles Reader/Writer & tapir's Schema; the alternative would define jsonBody etc. as the integration today, plus the macro to derive the JsonCodec.

Summing up, the top-level requirements for the macro are:

  • user-friendly error reports, clearly stating the derivation path that failed in case a codec for some nested type cannot be found
  • configurable derivation of inheritance strategies, naming strategies and enumeration handling
  • compile-time generation of the codec
  • drop-in replacement for the current uPickle integration
  • support for all Scala 3 types (enums, opaque, sum, intersection, etc.)

adamw avatar May 30 '23 16:05 adamw

Here are some notes after my initial analysis:

General remarks

Some of our requirements can be addressed with the @upickle.implicits.key annotation. I don't know if we can add annotations using macros, here's a thread where I'm asking for advice to figure this out. In cases where that's the only viable possibility, I've put a 🔑 icon to emphasize this.

Features

  • encodedName for fields
    • Can be achieved with @upickle.implicits.key annotation set on a field.
    • We can also override objectAttributeKeyReadMap and objectAttributeKeyWriteMap in our custom pickler which extends AttributeTagged. This method is recommended for customizing field name transformations like snake_case, but it can also be leveraged for other kinds of transformations
  • transform field names with a custom function
    • Overriding methods from AtributeTagged should be enough to achieve this
  • transform enum values with a custom function
    • For simple enums and case objects in a sealed trait hierarchy, the @upickle.implicits.key annotation on the enum can be used to rename the value (yes, it's called "key", but in this case it's used by uPickle to transform values). 🔑
      • [minor] Limitation: we can only transform to string values, there's no way to get an ordinal integer. We can get { "customerStatus": "5" } , but not { "customerStatus": 5 }
    • For enums with extra fields, uPickle creates JSON objects with a discriminator field
      • The name of the discriminator field is $type , but it can be changed if tagName is overridden in a custom pickler
      • The value of the discriminator field can be set with @upickle.implicits.key on the enum 🔑
  • sealed trait hierarchy (inheritance)
    • Decoding with a discriminator field: similarly to enums with fields. Field name can be set by overriding tagName, value can be set by putting @upickle.implicits.key on the class 🔑
    • Decoding with first-successful strategy: probably hard. It would require overridding AttributeTagged.taggedObjectContext to return a custom ObjectVisitor with only some of the logic changed. Sounds like a tricky ground.
    • Decoding using a single-field produce TODO
  • default values
    • uPickle uses default values of case class fields
    • To override this behavior, it is possible override CaseClassReadereader.storeDefaults, example here

kciesielski avatar Jun 21 '23 14:06 kciesielski

First, a side note - if you're not lucky on the scala-users forum, you can also try dotty discussions in the metaprogramming section: https://github.com/lampepfl/dotty/discussions/categories/metaprogramming

Second sinde note: I think a good "terminology fix" might be to call enumerations only "true" enumerations, that is Scala 3 enums, where all cases are parameterless. If the cases have parameters, that's only a sugar for a sealed trait.

What is kind of worrying is the some cases can only be handled with 🔑 . So either we find a way to add annotations to a type using macros, or ... ? I guess there's no alternative really.

Well, except rewriting the pickler derivation. After reading the upickle code, is that even feasible?

adamw avatar Jun 22 '23 19:06 adamw

I see, thanks for explaining with enumerations, let's use the terminology as you suggested. The discussion board you posted looks promising. I was able to find a fresh thread on refining types, which may be helpful to deal with annotations. Working on this now.

kciesielski avatar Jun 23 '23 13:06 kciesielski

I was thinking about a possible implementation strategy, and here's what I came up with.

The first constraint is that we should honor existing ReadWriter instances when they exists - either for the built-in types, or some esoteric ones.

The second constraint is that derivation should follow standard Scala practices, that is be recursive - so that the derived typeclass for a product/coproduct is created using implicitly available typeclass instances for children. This rules out Codec as the typeclass, as it's not recursive - only the top-level instance for a type is available.

Picklers

Still, we need to derive both the ReadWriter instance and the Schema instance. So maybe we should do just that: derive that pair, with an option to convert to a Codec. E.g.:

case class Pickler[T](rw: ReadWriter[T], schema: Schema[T]):
  def toCodec: JsonCodec[T]

implicit def picklerToCodec[T](implicit p: Pickler[T]): JsonCodec[T] = p.toCodec

The Pickler name is quite random, but it's the best I came up with so far ;)

Configuration

Another design decision is what means of configuration to provide for the derived schemas/picklers. We already have two ways of customising schemas: using annotations and by modifying the implicit values. Originally I suggested adding a third one (explicitly providing an override for annotations), but maybe that's not necessary and we can use what's already available.

That is, the implicitly available Schema for a type could be used to guide the derivation of the ReadWriter - if it's missing. The schema already has all that we need: user-defined field names and default values. Btw., here #2943 would be most useful to be able to externally provide alternate field names.

This also means that the Pickler derivation would have to assume, that the schema's structure follows the type's structure (when it's a product/coproduct), and report an error otherwise.

Derivation

Now the main complication is implementing Pickler.derived[T]. I think it should follow more or less these rules:

  • if a Schema and ReadWriter are already implicitly available in the scope, use them to create a Pickler
  • if the schema is missing and we're dealing with a product/coproduct, use code similar to what's currently in SchemaMagnoliaDerivation to create the new typeclass instance. Side note: we could simply do Schema.derived[T], but that could have negative performance implications, as it would do the nested lookups once again. So it could be slow.
  • if the ReadWriter is missing (i.e., not implicitly available), create one for a product/coproduct, using what's available in the Schema

Enums, inheritance

To support special cases, such as various enumerations or inheritance strategies, we can use a similar approach as currently, that is provide methods on Pickler to create the instances: Pickler.derviedEnumeration (similar as the method on Schema and Codec), Pickler.oneOfUsingField, Pickler.oneOfWrapped (similar as on Schema).

That way we would use the "standard" Scala way of configuring generic derivation - specifying the non-standard instances by hand - instead of inventing our own one.

Runtime/compiletime

Using the schema to create the ReadWriter instance means that it would be created at run-time - as only then, we have access to the specific Schema instance (which might be user-provided and computed arbitrarily). So at compile-time, we would only generate code which would do the necessary lookups / create the computation.

Of course, there might a hole in the scheme above and it might soon turn out that it's unimplementable ;) WDYT @kciesielski ?

adamw avatar Jul 08 '23 11:07 adamw

Leaving some notes after our recent discussion with @adamw:

  1. The main API entrypoint is Pickler, and we want to allow deriving picklers without users providing schemas.
  2. If we allowed creating Pickler[T] with user-provided Schema[T], we would break the mechanism of Pickler creating its own schema out of child schemas from summoned child picklers. That's why we emit a compilation error when a Schema is in scope, but no Reader/Writer. Either both Schema/ReadWriter is provided or the Pickler takes care of deriving them.
  3. Therefore, to allow schema customization outside of case class annotations, we need some API in the Pickler, something like:
Pickler.derivedCustomise[Person](
  _.age -> List(@EncodedName("x")),
  _.name -> List(@EncodedName("y"), @Default("adam")),
  _.address.street -> ...
)
  1. This customization DSL is then processed in the pickler in order to enrich derived schemas, and before creating Readers/Writers, which use schemas for encoded names and default values.

kciesielski avatar Sep 06 '23 09:09 kciesielski

Yes, looks correct :) In the future we might also want to add Schema.derivedCustomise for consistency, and maybe depracte the .modify variant of schema customisation then?

adamw avatar Sep 06 '23 10:09 adamw

Reopening for possible jsoniter work

adamw avatar Dec 14 '23 09:12 adamw