kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Fields name policy converters

Open gildor opened this issue 6 years ago • 26 comments

A very common case of serialization is when API/JSON naming policy conflicts with your code naming, for example, snake case quite popular naming style for JSON fields, but for Kotlin it's camel case. Also, format naming style can be incompatible with JVM (for example field names with dashes). Gson provides feature to solve that: https://google.github.io/gson/apidocs/com/google/gson/FieldNamingPolicy.html (implementation of FieldNamingStrategy)

In general, it's questionable feature, Moshi doesn't provide it and suggests to use the same case for your model classes.

But it can be a big blocker if you want to migrate to kotlinx.serialization, because you have 3 choices:

  1. Use @SerialName for each field. Tedious and error-prone and requires a lot of refactoring
  2. Rename all your data models to match API name convention. Tedious, requires even more refactoring, violates Kotlin code style guides and backward incompatible.
  3. Write own implementation of JSON (or any other format) that provides such feature, looks like a reasonable solution, but now you have one more copy JSON implementation in your project (because JSON implementation now a part of kotlinx.serialization) with different name, but the same API

Possible solutions:

  1. Add naming policy feature to JSON implementation. Some drawbacks: doesn't work for other formats, debatable nature of this feature, required not for all use cases.
  2. Provide universal API that allows defining name converters. A user can provide own implementation, so not necessary to include it to core fo kotlinx.serialization, works for all formats without additional effort. By default just uses field names as is, so current behavior. Possible problems: If you use more than one serialization format, different formats probably require different strategies, so we still should integrate naming policy to format implementation. Also, you sometimes want to use different naming policy for particular requests (from different API), so looks like Serialization format still should provide API to register and use naming policies.

gildor avatar Nov 17 '17 06:11 gildor

This feature is really needed!

BoxResin avatar Sep 11 '18 05:09 BoxResin

@Serializable
data class Data(val a: Int, @Optional val b: String = "42")

What about adding new class JsonSerializationStrategy?

This class can be something like

class JsonSerializationStrategy<T>(
    val serializer: SerializationStrategy<T>, 
    val config: JsonSerializationConfig) : SerializationStrategy(serializer)

Json.stringify will only accept JsonSerializationStrategy.

So, we can set "Fields name policy" in that config class.

Or adding new paramter in Json class.

Why to prefer 1 case: JsonSerializationStrategy can be cached/reused.

raderio avatar Feb 15 '19 12:02 raderio

Has there been any movement on this feature?

rscottcarson avatar Aug 27 '19 19:08 rscottcarson

As a Gson maintainer which was used as justification for this feature, we regret field naming policies and do not recommend their use. Moshi, the Gson successor, does not include field naming policies because we view it as a mistake.

JakeWharton avatar Aug 28 '19 00:08 JakeWharton

Jackson library also has field naming policies.

raderio avatar Aug 28 '19 10:08 raderio

I would like to propose a 3rd "solution" that is to use an intercepting encoders/decoders. Basically this encoder/decoder would delegate to the actual encoder, but use a delegating SerialDescriptor implementation that does the renaming when/where needed. Note that the generated code cannot know whether a name was provided automatically or manually (using @SerialName).. perhaps something to add a flag for.

Having said that. Related to the GSON issue I would say that there is a reason to not do this a your serialization can be used for two reasons:

  • temporary serialization to an opaque format (for deserialization only). In this case names don't really matter, so why bother with having names that are not identical to the Kotlin fields
  • permanent serialization to a transparent (defined) format. In this case there is a case for applying @SerialName even in the case that the names are equal, because type member renames should not lead to serial format changes (that would be incompatible).

Intercepting encoders/decoders could be useful for other cases as well (I saw a bug about encrypted data). Perhaps it would be worthwhile to provide a base class for these, but it is also an issue that would be better implemented outside the main runtime library.

pdvrieze avatar Aug 29 '19 14:08 pdvrieze

+1 for the global setting and annotation per class

Lewik avatar Apr 21 '20 08:04 Lewik

Any plans for this?

dragneelfps avatar Sep 19 '20 14:09 dragneelfps

We regret field naming policies and do not recommend their use.

@JakeWharton what was the reasoning behind this? I am not sure how would you have any problems if this is used as an optional feature. My guess would be performance concerns?

dragneelfps avatar Sep 19 '20 14:09 dragneelfps

Performance is not a concern. The serialization model is only built once.

It's more that the magical behavior is needless and you should embrace the naming conventions of the layer you are modeling, even if they break the otherwise normal conventions of the language in which you are doing the modeling. And if you need an alternate name, it can be specified explicitly so as to not break tools like grep.

More at https://publicobject.com/2016/01/20/strict-naming-conventions-are-a-liability/

JakeWharton avatar Sep 19 '20 15:09 JakeWharton

And if you need an alternate name, it can be specified explicitly so as to not break tools like grep

I guess the search tools we use should support name policy converters 😁 Because now it seems that we have to do tedious actions in Kotlin only because of some external factors...

And now if you even write the whole full-stack app in Kotlin, you still don't have a way to make the code more concise, for example, to minify names automatically (as in #908).

SerVB avatar Sep 19 '20 16:09 SerVB

Minifying names automatically is horrifying. It's impossible to make deterministic such that you retain compatibility across compilations and deployments. Persistence or blue/green deployment is thus impossible.

I'm happy to fight against that in addition to field name policies, though.

JakeWharton avatar Sep 19 '20 17:09 JakeWharton

It would be nice to give people the option to use casing strategies if their usecase demands it. We have a situation where we have existing systems built differently and so use different casing strategies. We'd like to use kotlinx.serialization to generate contracts (avro IDLs generated using avro4k) in a global format (snake-cased) to enable the systems to speak with each other, but sometimes those IDLs are being generated from systems who already use camel-casing everywhere. With this restriction, we'd be forced to always annotate every field using @SerialName on every field, which as pointed out in the OP is tedious and error prone. Since this seems like a pretty trivial change, it would be awesome if we can get this option as it could save us and others in a similar position a lot of pain.

williamboxhall avatar Oct 13 '20 01:10 williamboxhall

What is the plan here? Is there any alternative way we can prevent having to write hundreds of @SerialName entries just because our backend team has a different casing convention (snake_case) than we have in our Mobile app (lowerCamelCase)? It feels like lots of duplicate code ... just checkout this one type: Bildschirmfoto 2020-12-09 um 16 44 13

I absolutely disagree with what is said in the blog post linked to by @JakeWharton by the way. A naming change in an API is a breaking API change and is simply forbidden, but even if it's needed, our backend team isn't supposed to check for it. Instead, they report it to our Mobile team and they will do the check. Everyone there will know that we are converting snake_case to lowerCamelCase – so there is no such problem at all. So, while it might make sense for some teams, I think the official Kotlin serializer shouldn't be opinionated about the use cases here as there is clearly the need for such a thing. At least 55 people seem to agree with me (see the upvotes).

Considering Android apps, there's typically always an iOS counterpart and in Swifts official JsonDecoder there is absolutely a policy converter – so your backend team might not find all use cases without converting to lowerCamelCase anyways. Let's be realistic and just accept that in no world a backend developer will be able to actually trust such a search without converting cases as there will be always the potential for clients for doing the conversions.

Jeehut avatar Dec 09 '20 15:12 Jeehut

The only duplicate code is where you've specified a @SerialName where one isn't needed. Otherwise you're clearly mapping across domains in which case you are required to specify the identifier association on both sides. This is pretty standard fare for layer crossing such as serialization and persistence.

Any kind of automatic field naming conversion policy introduces additional edge cases that have to be considered. What if the JSON contains keys for external_id and externalId? This is valid JSON as keys do not collide. JSON is unordered so do you take first key? Last key? That means non-determinism. Do you throw at runtime? What if someone needs both do they have to drop out of a naming policy for the entire model tree? Or can you control it per-subgraph? You might have the luxury of ignoring these cases as never happening but the library does not.

An easy way to apply a field naming policy today is to specify your serialization format in an IDL and code gen the model objects on all platforms with appropriate policies applied at generation time through these annotations. It also means you're never out-of-sync on any platform and never have to write these model specifications more than once and never for a specific platform.

Finally, citing [other library] has this feature is not an argument for or against the feature in this library. Gson has ton of features I can point you to, and the majority of them were bad decisions in retrospect. We include field naming policies as one of them, and they're thankfully absent from ~Gson 3.0~ Moshi.

JakeWharton avatar Dec 09 '20 16:12 JakeWharton

@JakeWharton I appreciate your answer, but you're just repeating the same arguments but not addressing our question: Why can't we have the choice to say "we don't expect any problems like 'external_id' and 'externalId' and we don't care about the libraries behavior in that case as it won't happen - feel free to crash in that case if needed!".

I didn't cite other libraries which have this feature to say this is a good feature because others have it, too. I just said that their existence and wide adoption renders the argument of the blog post irrelevant.

Your suggestion to use an IDL just proves again that your view is considering a very restricted set of use cases. I can understand that in some random serializer, but I don't think an official serializer for such a widely applicable language such as Kotlin should be so restricted. I don't know your company, but we don't have the resources to solve this problem with an intermediary layer - our code is working fine, and we don't expect any changes as this is forbidden within an API version as per our requirements. I suspect many others are in a similar situation and using an IDL is just overkill.

Jeehut avatar Dec 09 '20 17:12 Jeehut

not addressing our question

I'm not the library author so I cannot answer the question. I can merely argue against this feature's inclusion as strongly as I can having been one of the maintainers of both Gson and Moshi which take different views on the subject matter.

If I were the library author, my answer would be that you can already do this with @SerialName. This is our answer in Moshi.

Moreover, introducing the policy introduces not only edge cases and require careful thought, but severe indirection in the generated code such that I'd be concerned about the performance impact. Not the mention it's unclear whether supporting this is even possible in a binary compatible way in a post-1.0 world.

I didn't cite other libraries which have this feature to say this is a good feature because others have it, too. I just said that their existence and wide adoption renders the argument of the blog post irrelevant.

It doesn't. The argument is for not building it into anything new.

your view is considering a very restricted set of use cases

Yeah, not really. @SerialName solves this problem without question. Period. Your issue, and this feature request, is about the associated verbosity of being explicit everywhere.

We (the maintainers of Gson and Moshi) have thought about the implications of supporting this at the library-level at length.

I don't know your company, but we don't have the resources to solve this problem with an intermediary layer

Considering using one of the many existing open-source tools for it which should require less work that defining models on multiple clients.

we don't expect any changes as this is forbidden within an API version as per our requirements.

The stated goal was for field mapping and reduction of "boilerplate". IDL solves both of these completely eliminating the need to write models on any client. If you write models by hand, it's not unreasonable for the models to need to be explicit "by hand" as well.

The layering of responsibility there is actually quite nice. When you move up a level of abstraction you get to define global transforms such as generated field name conversions.

I don't think an official serializer for such a widely applicable language such as Kotlin should be so restricted

Your opinion is noted. Mine is the opposite. We don't have to agree, and we won't.

JakeWharton avatar Dec 09 '20 17:12 JakeWharton

@JakeWharton

@SerialName solves this problem without question. Period. Your issue, and this feature request, is about the associated verbosity of being explicit everywhere.

I don't know what your "this problem" is referring to if not "this feature request". These two sentences contradict each other. I thought this is a discussion about "this feature request" or did I misunderstand how GitHub issues work? 55 people have confirmed they have "this problem" and the OP explains very well how it is still unsolved. Stating that it is already solved is nothing else but downplaying this problem and ignoring our voices.

But don't get me wrong, I do understand that you're trying to drive forward a specific way of thinking to prevent a given set of problems by design. I just disagree on the question if this is the right choice for the audience of this library. Or to put it in your own words:

Your opinion is noted. Mine is the opposite. We don't have to agree, and we won't.

😇

Jeehut avatar Dec 09 '20 17:12 Jeehut

Interestingly, in the XML format I wrote, I've introduced the concept of a policy. When creating the format you can set an instance of the interface (there is a default). This is used to determine various aspect, including what tag or attribute name to use. It does make it much easier to be compatible with other serialization frameworks (without automatic renaming - although that is possible). Important here is that this is something that the user of the library provides. The library uses it to build the tree structure representing the serialization metadata, and uses that structure for actual serialization (it adds complexity, but is much more manageable than doing the same in various places of the serialization/deserialization itself).

pdvrieze avatar Jan 12 '21 12:01 pdvrieze

@SerialName solves this problem without question. Period.

It does not, @JakeWharton, because you yourself suggested:

you should embrace the naming conventions of the layer you are modeling, even if they break the otherwise normal conventions of the language in which you are doing the modeling.

While that works for libraries that target a single serialization format (like Gson and Moshi), it does not necessarily work for libraries that target multiple serialization formats (like kotlinx.serialization). Because in case "the naming conventions of the layer you are modeling" conflict, it raises the question on which convention to align. And a configurable property naming strategy is an easy way to do that. A more complicated way is to have output-format-specific IDLs.

sschuberth avatar Apr 18 '21 11:04 sschuberth

Hey @sandwwraith, this issue has been open for quite some time. Is this feature on the roadmap?

While I sympathize with the arguments against its addition, the reality is that many of us work with disparate systems that require differing naming conventions. As @gildor outlined in their original post, and others have discussed, the absence of this feature presents a major blocker for users who require the functionality.

rgmz avatar Oct 05 '21 14:10 rgmz

@rgmz how is it a blocker? Did you mean "inconvenience of verbosity"?

If you use non-standard policies then it's only fair if you are inconvenienced for the sake of efficiency and compatibility of everyone else. Yes, we have to work with legacy systems. I'm myself here because I looked for that solution. But a more experienced person says "we tried it and it sucks". IMO, that suffices.

@sschuberth do you remember what was the world before standard formats (like json) came? Everyone invented his own. You had to write a serializer/deserializer for pretty much every single format. Json is now more-or-less a common ground for everyone, but json authors failed to enforce naming policies. It's not late to fix that.

snakeru avatar Jan 19 '22 10:01 snakeru

It's not late to fix that.

I believe it is too late to enforce JSON naming policies. There's too much JSON in the wild that uses camel-case, snake-case, or whatever (even mixed in the same file) for field names. If you want to at least keep your data model naming clean, you need some sort of field name conversion.

sschuberth avatar Jan 19 '22 11:01 sschuberth

@sschuberth The question is more on where/how to do this. I can explain a little bit about how/why it is supported in the XML format. The first point is that the xml format creates shadow descriptors. It needs to do this for a number of purposes, including a more complex naming policy that differs for attributes and tags as well as namespace support. The shadow descriptors add an overhead to the format but allow for xml serialization to be quite flexible and follow a hierarchy that could have been hand-designed (they also make implementation more consistent between encoding and decoding). Key here is also that there needs to be a way to determine which tag/attribute names (with namespaces) to use and that choice is somewhat arbitrary - ie we always need a policy that cannot be @SerialName (that doesn't support namespaces) unless we want to mandate annotations for all names.

For JSON the requirements are quite different. JSON maps much more directly upon programming language types and doesn't have to things like random order tags that are named based upon their type, not the attribute they are stored under. As such there is no need to incur the overhead of creating shadow descriptors as part of serialization. The @SerialName annotation is something that works at compile time to determine the name written in the descriptor, and as such has no runtime overhead. It is more than sufficient for JSON purposes (unlike XML that needs to deal with namespaces and (natural) type names (that don't have a lot of dots in them)).

What you are advocating is for a runtime system to map attribute names to serialnames, which however has statically deterministic results. In other words, it can be done statically (and incur no execution overhead). What is suggested is that:

  1. Explicitly using @SerialName even when the name is equal allows for stability in the API even if the attributes change name.
  2. This is a problem that is quite easy to handle with a tool that takes an IDL and generates data model classes out of it. If you don't want to do that, you can probably get very very far with structural search & replace instead.

pdvrieze avatar Jan 19 '22 13:01 pdvrieze

Performance is not a concern. The serialization model is only built once.

It's more that the magical behavior is needless and you should embrace the naming conventions of the layer you are modeling, even if they break the otherwise normal conventions of the language in which you are doing the modeling. And if you need an alternate name, it can be specified explicitly so as to not break tools like grep.

More at https://publicobject.com/2016/01/20/strict-naming-conventions-are-a-liability/

Basically, if anyone is using enums in a layer that goes through deserialization, they must embrace the pain of adding alternative names to all the enums?!

mecoFarid avatar Mar 17 '22 16:03 mecoFarid

Just to add my 2 cents.

Literally every json dialect I deal with is using snake casing, so addressing this would make using this framework a bit less of the PITA it is right now for me.

For better or for worse, snake casing is the dominant way to do identifiers in a wide range of languages. The JSON specification is not opinionated on what identifiers should and should not look like. Pretty much any valid string is valid to use as a key in a JSON object. Parsers / serializers should not be more opinionated than the standard.

I'd recommend implementing the same naming conventions that jackson has.

They implement the following strategies:

PropertyNamingStrategy.KebabCaseStrategy, PropertyNamingStrategy.LowerCaseStrategy, PropertyNamingStrategy.SnakeCaseStrategy, PropertyNamingStrategy.UpperCamelCaseStrategy

Along with a default of "as is". That would be a nice optional parameter to have on encode/decode functions and as a parameter on @Serializable. It would largely remove the need for ever needing @SerialName except in cases where the naming conventions are not consistent.

Having to spell this out on a per field basis with @SerialName is tedious and error prone. And breaking kotlin coding conventions is also not ideal given projects with coding standards.

jillesvangurp avatar May 26 '22 11:05 jillesvangurp

While I have implemented such a policy in the XML format, there is actually a serious limitation in the case of Json. That is that it is not currently possible for a format to know whether a name was given using @SerialName or was the name used in code. In XML this is solved using an additional annotation that also deals with namespaces etc. the default naming policy uses the presence of annotations to determine whether type or attribute names should be used. Any workable policy should allow for overrides (which could be policy-specific annotations).

One thing to consider though is that there are fundamentally two perspectives on serialization:

  • arbitrary: in this approach serialization is important but names are not that important.
  • stable: in this approach serialization is used to map to a specific format that is likely externally defined. This is a much more complex case and somewhat the "ugly stepsister". To be truly stable this would require consistent annotation.

pdvrieze avatar Nov 27 '22 15:11 pdvrieze

Although I agree that global naming strategies are somewhat malicious programming practice, given that there's a significant demand for this feature (including such use cases as migrating from Jackson to kotlinx.serialization), there's a prototype that'll likely get into the next RC: #2111

sandwwraith avatar Nov 29 '22 15:11 sandwwraith