tapir
tapir copied to clipboard
Module for JSON codec derivation
Currently, to create a json body input/output, both a Schema
and json-library-specific encoders/decoders are needed. This means, that generic derivation is typically done twice (once for json encoders/decoders, once for schemas). Moreover, any customisations as to the naming strategy etc. need to be duplicated, often using different APIs, both for the json library and for the schemas.
It would be great to do the configuration and derivation once - but to do that, we would need to provide a module which would provide joint json encoder/decoder + tapir schema derivation. In other words, we would need to write code which derives a JsonCodec[T]
(this includes the encode
, decode
and schema
).
Doing this for all json libraries would be highly impractical, and a ton of work, for which we don't have resources. That's why I'd like to approach this using the json library that will be included in the Scala toolkit - that is, uPickle. uPickle can use a better derivation mechanism anyway (as our blogs have described), so it might be an additional win for our users.
Such a derivation would have to be written using a macro - and as we know, these are different in Scala 2/3. I think we should target Scala 3.
So summing up, the goal of the new module is to:
- deliver a macro implementing generic derivation for a
JsonCodec[T]
for supportedT
types - the json implementation used should be uPickle
- we are targeting Scala 3
While it might seem that the derivation could be implemented using Magnolia, I think writing a dedicated macro, which could utilize Scala 3's Mirror
s, would actually be better. First, we would directly generate the code, instead of generating an intermediate representation, which is only converted to the final codec at run-time. That's a small performance win. But furthermore, we can provide better, contextual, error reporting. And excellent errors is something I'd like to be a priority for this task. I've done some experiments with deriving Schema
using a macro directly here, but the work there has unfortunately stalled.
As for configuring the derivation, we should take into account the following:
- customisations specified using
Schema.annotatations
on a perf-field/per-type basis - e.g.@encodedName
should influence both the schema, and the generated json enoder/decoder - global customisations as specified in
Configuration
(global field name transformers etc.) - more options, than there are currently available through
Configuration
, to configure inheritance hierarchy serialization. This should include:- deserialisation using a discriminator field (partially available now) - with a value given with an annotation, or defaulting to the type's name
- deserialisation using a single-field product (see
Schema.oneOfWrapped
) - deserialisation using a "first-successful" strategy
- overriding the inheritance configuration locally using an annotation
- maybe some more - to research what's available in other libraries
- various options to serialise enumerations: as a string representation, as a result of function application, as an ordinal
- adding annotations externally, e.g. through a list (class field, annotation value) pairs
In the end, the user should get an alternative to the current import sttp.tapir.json.upickle.*
+ optional imports for auto-deriving uPickles Reader
/Writer
& tapir's Schema
; the alternative would define jsonBody
etc. as the integration today, plus the macro to derive the JsonCodec
.
Summing up, the top-level requirements for the macro are:
- user-friendly error reports, clearly stating the derivation path that failed in case a codec for some nested type cannot be found
- configurable derivation of inheritance strategies, naming strategies and enumeration handling
- compile-time generation of the codec
- drop-in replacement for the current uPickle integration
- support for all Scala 3 types (enums, opaque, sum, intersection, etc.)
Here are some notes after my initial analysis:
General remarks
Some of our requirements can be addressed with the @upickle.implicits.key
annotation. I don't know if we can add annotations using macros, here's a thread where I'm asking for advice to figure this out. In cases where that's the only viable possibility, I've put a 🔑 icon to emphasize this.
Features
- encodedName for fields
- Can be achieved with
@upickle.implicits.key
annotation set on a field. - We can also override
objectAttributeKeyReadMap
andobjectAttributeKeyWriteMap
in our custom pickler which extendsAttributeTagged
. This method is recommended for customizing field name transformations like snake_case, but it can also be leveraged for other kinds of transformations
- Can be achieved with
- transform field names with a custom function
- Overriding methods from
AtributeTagged
should be enough to achieve this
- Overriding methods from
- transform enum values with a custom function
- For simple enums and case objects in a sealed trait hierarchy, the
@upickle.implicits.key
annotation on the enum can be used to rename the value (yes, it's called "key", but in this case it's used by uPickle to transform values). 🔑-
[minor] Limitation: we can only transform to string values, there's no way to get an ordinal integer. We can get
{ "customerStatus": "5" }
, but not{ "customerStatus": 5 }
-
[minor] Limitation: we can only transform to string values, there's no way to get an ordinal integer. We can get
- For enums with extra fields, uPickle creates JSON objects with a discriminator field
- The name of the discriminator field is
$type
, but it can be changed iftagName
is overridden in a custom pickler - The value of the discriminator field can be set with
@upickle.implicits.key
on the enum 🔑
- The name of the discriminator field is
- For simple enums and case objects in a sealed trait hierarchy, the
- sealed trait hierarchy (inheritance)
- Decoding with a discriminator field: similarly to enums with fields. Field name can be set by overriding
tagName
, value can be set by putting@upickle.implicits.key
on the class 🔑 - Decoding with first-successful strategy: probably hard. It would require overridding
AttributeTagged.taggedObjectContext
to return a customObjectVisitor
with only some of the logic changed. Sounds like a tricky ground. - Decoding using a single-field produce TODO
- Decoding with a discriminator field: similarly to enums with fields. Field name can be set by overriding
- default values
- uPickle uses default values of case class fields
- To override this behavior, it is possible override
CaseClassReadereader.storeDefaults
, example here
First, a side note - if you're not lucky on the scala-users forum, you can also try dotty discussions in the metaprogramming section: https://github.com/lampepfl/dotty/discussions/categories/metaprogramming
Second sinde note: I think a good "terminology fix" might be to call enumerations
only "true" enumerations, that is Scala 3 enum
s, where all cases are parameterless. If the cases have parameters, that's only a sugar for a sealed trait
.
What is kind of worrying is the some cases can only be handled with 🔑 . So either we find a way to add annotations to a type using macros, or ... ? I guess there's no alternative really.
Well, except rewriting the pickler derivation. After reading the upickle code, is that even feasible?
I see, thanks for explaining with enumerations, let's use the terminology as you suggested. The discussion board you posted looks promising. I was able to find a fresh thread on refining types, which may be helpful to deal with annotations. Working on this now.
I was thinking about a possible implementation strategy, and here's what I came up with.
The first constraint is that we should honor existing ReadWriter
instances when they exists - either for the built-in types, or some esoteric ones.
The second constraint is that derivation should follow standard Scala practices, that is be recursive - so that the derived typeclass for a product/coproduct is created using implicitly available typeclass instances for children. This rules out Codec
as the typeclass, as it's not recursive - only the top-level instance for a type is available.
Picklers
Still, we need to derive both the ReadWriter
instance and the Schema
instance. So maybe we should do just that: derive that pair, with an option to convert to a Codec
. E.g.:
case class Pickler[T](rw: ReadWriter[T], schema: Schema[T]):
def toCodec: JsonCodec[T]
implicit def picklerToCodec[T](implicit p: Pickler[T]): JsonCodec[T] = p.toCodec
The Pickler
name is quite random, but it's the best I came up with so far ;)
Configuration
Another design decision is what means of configuration to provide for the derived schemas/picklers. We already have two ways of customising schemas: using annotations and by modifying the implicit values. Originally I suggested adding a third one (explicitly providing an override for annotations), but maybe that's not necessary and we can use what's already available.
That is, the implicitly available Schema
for a type could be used to guide the derivation of the ReadWriter
- if it's missing. The schema already has all that we need: user-defined field names and default values. Btw., here #2943 would be most useful to be able to externally provide alternate field names.
This also means that the Pickler
derivation would have to assume, that the schema's structure follows the type's structure (when it's a product/coproduct), and report an error otherwise.
Derivation
Now the main complication is implementing Pickler.derived[T]
. I think it should follow more or less these rules:
- if a
Schema
andReadWriter
are already implicitly available in the scope, use them to create aPickler
- if the schema is missing and we're dealing with a product/coproduct, use code similar to what's currently in
SchemaMagnoliaDerivation
to create the new typeclass instance. Side note: we could simply doSchema.derived[T]
, but that could have negative performance implications, as it would do the nested lookups once again. So it could be slow. - if the
ReadWriter
is missing (i.e., not implicitly available), create one for a product/coproduct, using what's available in theSchema
Enums, inheritance
To support special cases, such as various enumerations or inheritance strategies, we can use a similar approach as currently, that is provide methods on Pickler
to create the instances: Pickler.derviedEnumeration
(similar as the method on Schema
and Codec
), Pickler.oneOfUsingField
, Pickler.oneOfWrapped
(similar as on Schema
).
That way we would use the "standard" Scala way of configuring generic derivation - specifying the non-standard instances by hand - instead of inventing our own one.
Runtime/compiletime
Using the schema to create the ReadWriter
instance means that it would be created at run-time - as only then, we have access to the specific Schema
instance (which might be user-provided and computed arbitrarily). So at compile-time, we would only generate code which would do the necessary lookups / create the computation.
Of course, there might a hole in the scheme above and it might soon turn out that it's unimplementable ;) WDYT @kciesielski ?
Leaving some notes after our recent discussion with @adamw:
- The main API entrypoint is
Pickler
, and we want to allow deriving picklers without users providing schemas. - If we allowed creating
Pickler[T]
with user-providedSchema[T]
, we would break the mechanism of Pickler creating its own schema out of child schemas from summoned child picklers. That's why we emit a compilation error when aSchema
is in scope, but noReader/Writer
. Either bothSchema/ReadWriter
is provided or the Pickler takes care of deriving them. - Therefore, to allow schema customization outside of case class annotations, we need some API in the Pickler, something like:
Pickler.derivedCustomise[Person](
_.age -> List(@EncodedName("x")),
_.name -> List(@EncodedName("y"), @Default("adam")),
_.address.street -> ...
)
- This customization DSL is then processed in the pickler in order to enrich derived schemas, and before creating Readers/Writers, which use schemas for encoded names and default values.
Yes, looks correct :) In the future we might also want to add Schema.derivedCustomise
for consistency, and maybe depracte the .modify
variant of schema customisation then?
Reopening for possible jsoniter work