Newtonsoft.Json icon indicating copy to clipboard operation
Newtonsoft.Json copied to clipboard

Modification to F# discriminated unions implementation to support treating Fields as a Record #1662

Open CarlKinghorn opened this issue 6 years ago • 25 comments

Please see issue #1662 for reference and discussion. The text below combines parts of that discussion in a simplified description.

Modification to F# discriminated unions implementation to support treating Fields as a Record

Serialization of F# discriminated unions in the current Json.NET implementation and in the option presented in issue #1547 result in unfavorable output in my opinion. Below I will list problems and propose a solution.

Unfavorable result 1 - Current implemtation

Example of F# code and resultant Json from the current implementation

type DUSerializationTest =
    | CaseA of id:int * name:string * dateTimeStamp:DateTime
    | CaseB of string * int
    | CaseC of description:string * DUSerializationTest

let serializeTest =
    CaseA (1, "Carl", DateTime.Now)
    |> JsonConvert.SerializeObject

let serializeTest2 =
    CaseB ("CaseB Item1 Example", 2)
    |> JsonConvert.SerializeObject

let serializeTest3 =
    CaseC ("Recursive Type Definition Example", CaseB ("CaseB Item1", 2))
    |> JsonConvert.SerializeObject
{
    "Case": "CaseA",
    "Fields": [
        1,
        "Carl",
        "2018-03-28T00:54:08.0652638+00:00"
    ]
}

{
    "Case": "CaseB",
    "Fields": [
        "CaseB Item1 Example",
        2
    ]
}

{
    "Case": "CaseC",
    "Fields": [
        "Recursive Type Definition Example",
        {
            "Case": "CaseB",
            "Fields": [
                "CaseB Item1",
                2
            ]
        }
    ]
}

I find this to be functional but not usable in scenarios where interoperability or discovery is important. Having a simple array of "Fields" as a series of ordered values without labels does not give any kind of hint of intent to the consumer.

Furthermore, it is my experience that when looking at API documentation or working with 3rd parties (who sometimes just send sample Json payloads as documentation) it is common practice to use a tool like json2csharp to generate classes at least as a basis for implementing the interaction. Doing this on the output detailed above results in the following class definition.

public class RootObject
{
    public string Case { get; set; }
    public List<object> Fields { get; set; }
}

As you can see the List<object> Fields property leaves us with more responsibility being placed on the consuming developer to get the implementation correct based on documentation. It would be preferable to have Json that would result in generated code that is named and explictly typed.

Unfavorable result 2 - Issue #1547 proposed struct-based implementation

While issue #1547 presents a good potential solution to the 1st unfavorable result I believe there are three problems which should prevent this library from adopting it.

Issue 1 - Current usage

The current implementation of serialization of Discriminated Unions by Json.NET is already published and available for common use; this means utilizing the existing [Struct] common attribute of Discriminated Unions would mean that developers who have implemented and potentially persisted [Struct] attributed unions in data stores would unexpectedly and unexplainably (without researching) have to deal with their programs not behaving properly or data loss.

Issue 2 - Limited to [Struct]

Outside of the obvious desire one would have to minimize restrictions on functionality the [Struct] attribute limits the ability to have a recursive type definition.

Issue 3 - "Case" becomes a reserved word with no compiler warning

The following F# would result in two instances of "case" being used within the Json output.

[<Struct>]
type DUSerializationCaseTest =
    | CaseA of case:string * dateTimeStamp:DateTime
    | CaseB of Case:string * DateTimeStamp:DateTime

let serializeTest =
    CaseA ("Not ideal", DateTime.Now)
    |> JsonConvert.SerializeObject

let serializeTest2 =
    CaseB ("Problem", DateTime.Now)
    |> JsonConvert.SerializeObject

Example 1 - technically valid because of the case-sensitivity of Json which, because of the common F# convention to use camelCasing in union case fields, could have a low impact on the majority of users.

{
    "Case": "CaseA",
    "case": "Not ideal",
    "dateTimeStamp": "2018-03-28T00:54:08.0652638+00:00"
}

Example 2 - invalid, duplicate key

{
    "Case": "CaseA",
    "Case": "Problem",
    "DateTimeStamp": "2018-03-28T00:54:08.0652638+00:00"
}

Proposed Solution

Begin by solving the "current usage" problem listed above by leaving the default functionality and adding an attribute which can be used to decorate Discriminated Unions so that they may be treated differently when serializing and deserializing.

public class DiscriminatedUnionFieldsAsRecordAttribute : Attribute { }

Modify the output of serializing a Discriminated Union decorated in this fashion in the following ways:

  • Use the union case as a label for the top level identifier. This is beneficial to isolate and differentiate the label:value pairs of the union case from the wrapper identifier.
  • Treat the fields of the union case being serialized as a record. This is beneficial for interoperability, discovery, and readability purposes. Additionally, the implementation of the deserialization of the "Record" does not have to follow a strict ordering of values since we can match on the labels and create the union case more safely.

Example of proposed changes

[<DiscriminatedUnionFieldsAsRecord>]
type DUCaseTest =
  | CaseA of id:string * name:string
  | CaseB of int
  | CaseC of numberOfThings:int

let serializeTest =
    CaseA (1234, "Carl")
    |> JsonConvert.SerializeObject

Results in the following Json

{
    "CaseA": {
        "id": "1234",
        "name": "Carl"
    }
}

Example of json2csharp output when these changes have been applied

public class CaseA
{
    public string id { get; set; }
    public string name { get; set; }
}

public class RootObject
{
    public CaseA CaseA { get; set; }
}

As I hope you will agree, the results of the proposed modification yield better results.

CarlKinghorn avatar May 08 '18 05:05 CarlKinghorn

I like what you're trying to do, it is common feedback that JSON for DU isn't what they expected.

Two things:

  1. Rather than introduce a new attribute I'd like to make doing this an option on DiscriminatedUnionConverter. That would let people to generate this JSON without placing attributes. For people who do want to use an attribute they can add something like [JsonConverter(typeof(DiscriminatedUnionConverter), true)] to their F# DU type (using whatever the F# attribute syntax is).
  2. Do you know what JSON Chiron produces for DU? That appears to be the standard within the F# community. My feeling is that people want a flat list of fields, not fields nested in a object. Maybe Case could be written as $case to avoid collisions.
{
    "$case": "CaseA",
    "Case": "Problem",
    "DateTimeStamp": "2018-03-28T00:54:08.0652638+00:00"
}

JamesNK avatar Jun 19 '18 23:06 JamesNK

Also DU's can have unnamed values. An exception should be thrown when attempting to serialize and deserialize them with this named DU mode.

JamesNK avatar Jun 20 '18 00:06 JamesNK

@JamesNK You should also look at what FSharpLu does - see https://blogs.msdn.microsoft.com/dotnet/2016/12/13/project-springfield-a-cloud-service-built-entirely-in-f/

That's a fairly popular option, as well.

ReedCopsey avatar Jun 20 '18 00:06 ReedCopsey

Submitting for discussion a few converters I have/use for DU serialization at work: https://github.com/baronfel/Newtonsoft.Json.FSharp.Idiomatic

There are some common cases that work very well for DUs (especially enum-style DUs), but the hard nut to crack is mixed or tuple-field-style DUs. Its easy to do it with a case field, but if you add another constraint that field names must be disjoint then you get to something like my OutOfOrder converter. Then with a few more enhancement you can handle option types and mixed-case DUs.

baronfel avatar Jun 20 '18 00:06 baronfel

In the case of unnamed values you could use Item1, Item2, etc, as the structure is mapped to a tuple under the hood. The UnionCase API gives you the auto-generated names in this case.

baronfel avatar Jun 20 '18 00:06 baronfel

Thanks for the feedback

@JamesNK
Re: 1. - this makes sense and is more in-line with the project's style. I will make this change.
Re: 2. - I haven't looked into JSON Chiron though I will and consider this suggestion.

@JamesNK @baronfel
Re: DU cases with unnamed values - this implementation handles unnamed values via the Item1, Item2, etc... paradigm mentioned in @baronfel's last comment

@ReedCopsey
Re: FSharpLu - I reviewed this implementation and you can find my original comment about it here. I will also review the blog link you mentioned for more context.

@baronfel
Re: mixed or tuple-field-style DUs - I need to do some testing into this "hard nut to crack" and see how my implementation handles it. If it doesn't handle it well I'll look into your solution to see if it can be incorporated.

CarlKinghorn avatar Jun 20 '18 04:06 CarlKinghorn

It's probably not appropriate for this discussion, as it seems one of the goals is to have JSON which trivially "looks" like a C# class, however...

I've often treated JSON arrays like a tuple. Combining this with strings-as-labels (a poor approximation of atoms, as found in Erlang, Ruby, et cetera) in the initial element of the array-as-tuple, lets you have a fairly compact, generalized encoding that can be arbitrarily nested and allows both labelled and unlabelled data. This seems particularly apt for encoding discriminated unions.

As an example, given:

type DUSerializationTest =
    | CaseA of id:int * name:string * dateTimeStamp:DateTime
    | CaseB of string * int
    | CaseC of description:string * DUSerializationTest

you might see instances encoded as:

// CaseA (123, "some guy", DateTime.UtcNow)
[ "CaseA", [ ["id", 123], ["name", "some guy"], ["dateTimeStamp", "2018-06-20T07:45:00.000Z"] ] ]
// CaseB ("some guy", 123)
[ "CaseB", [ "some guy", 123 ] ]
// CaseC ("foo", CaseB ("some guy", 123))
[ "CaseC", [ [ "description", "foo" ],  [ "CaseB", [ "some guy", 123 ] ] ] ]

Of course, this approach isn't perfect. And it certainly isn't going to automagically convert into the expected C# class(es). But I thought I'd share it for general awareness purposes.

pblasucci avatar Jun 20 '18 05:06 pblasucci

Just a small comment: Probably the most common situation people complain about with F# DUs to JSON is Option<T>. For example, I would suggest that for idiomatic JSON generation, Some 5 should just render as 5 in the JSON object, and None should be treated as null / missing. I made a crude prototype for an alternative serializer for DUs many, many moons ago - I never had it working 100% but I felt it was a step forward (https://gist.github.com/isaacabraham/ba679f285bfd15d2f53e). Also consider how Fable.JSON currently supports them - @alfonsogarciacaro do you have any thoughts on this?

isaacabraham avatar Jun 20 '18 07:06 isaacabraham

For Fable tooling we also need to take into account how the JSON will be deserialized on client side, so I wouldn't dare to recommend our way as the standard for all F# devs. But just for reference, so far we've been using custom converters that look like this for options and like this for unions in general (for F# list we rely in standard Newtonsoft.Json serialization which turns them into arrays).

However, it's very likely that for Fable 2 we will be recommending using something like Thoth.Json function decoders, although this will be transparent for most users.

The generated JSON looks like @isaacabraham says for options, just a string for union cases without fields, and as an array for cases with fields: MyCase(1,2) becomes ["MyCase", 1, 2].

alfonsogarciacaro avatar Jun 20 '18 08:06 alfonsogarciacaro

I currently use the Fable.JSON for non-fable apps because I like the way it does unions and option types.

So far I haven't needed to use any Json attributes on my types/records so if this can be avoided that would be great.

davidtme avatar Jun 20 '18 08:06 davidtme

@alfonsogarciacaro @isaacabraham I really rather like that treatment.

pblasucci avatar Jun 20 '18 08:06 pblasucci

Just FYI, my look from a schema perspective: JSON Schema/Swagger describes a tuple type as an array with fixed length and defined type per element. The item names are not serialized because they are not needed for deserialization (the type is known). But I think there’s no way to define the discriminator in a schema, so you’d probably have to specify this as object schemas with discriminator...

RicoSuter avatar Jun 20 '18 17:06 RicoSuter

Could we take a step back and answer the bigger question: Why are you serialising data as JSON?

Is there any other reason than interoperability?

If you want to be interoperable, the data generated should make sense to consumers on various platforms, using various languages. Those consumers could have no idea what a discriminated union is. Examples could be C# or Java..

I think @isaacabraham is on the right track: cases should simply disappear, while the data should remain. As an example, given the OP DUSerializationTest, only the data should remain:

{
  "name": "Carl",
  "dateTimeStamp": "2018-03-28T00:54:08.0652638+00:00"
}

I don't have an opinion on how the other two examples should serialise, since the tuple is unnamed.

I know what you'll say next: This can't possibly round-trip!

You're right, but if you want round-tripping serialisation, then use a binary formatter of some sort. There's no reason to use JSON, then.

ploeh avatar Jun 20 '18 20:06 ploeh

People can use a custom non-round-tripping converter if they want but the one that comes with Newtonsoft.Json will round-trip.

I like this as a format:

{
  "$case": "User",
  "name": "Carl",
  "dateTimeStamp": "2018-03-28T00:54:08.0652638+00:00"
}

But that doesn't handle unnamed values. Are unnamed values common in DUs? Options are:

  1. Throwing an exception. I suspect that would be annoying if someone wants names everywhere and registers the converter in their settings but a few DUs have unnamed values.
  2. Item1, Item2, etc. That could conflict with other names though
  3. $item1, $item2?
  4. $values: [ array of unnamed values here ]

JamesNK avatar Jun 20 '18 22:06 JamesNK

We've been using a custom serialiser for unions which formats based on the shape.

It's worked really well for us and we use it to send data to the client and to persist events for event sourcing which need to be serialised and deserialised.

Enum Union If the union only has case labels then it will serialise exactly the same as an enum would

type UniverseType =
    | Infinite
    | Finite

Value Union If at least one case has a payload we fall back to an object with a property for the case. I believe this is the SharpLu.Json format. Keeping the case label is very important to us and using the property makes using in via javascript more streamlined.

type MyRecord = { Name: string }

type MyUnion = 
    | CaseA of int
    | CaseB of MyRecord

{
    "CaseA": 1
}
// or
{
    "CaseA": { "Name": "Rabbit" }
}

Fields Union If the union contains multiple fields then we use an array

type MyUnion = 
    | CaseA of int * int * int

{
    "CaseA": [1, 2, 3]
}

Anyone interested in a Pull Request? (in F#, or not)

daniellittledev avatar Jun 20 '18 22:06 daniellittledev

Is Options a DU? Could also have a second flag on for simple Options on DiscriminatedUnionConverter. It would be off by default for backwards compat but it is simple enough to turn on.

JamesNK avatar Jun 20 '18 22:06 JamesNK

We also have a converter for Options which works like this:

Some "thing" -> "thing"
None -> null

It's worked fairly well, but there is one problem, it's lossy.

Some (Some None) -> null

daniellittledev avatar Jun 20 '18 22:06 daniellittledev

not sure if I missed it but there is the obvious simplification of single-constructor DU's as well

type MyType = MyType of SomeValue

those can and should be serialized to just SomeValue (and deserializing should wrap the constructor again)

Also I'm strongly in the lawful camp - so no matter what serializers/deserializers should have some laws attached - I think the obvious one is

deserialize << serialize = Just

meaning: if something serialized should successful deserialiize to the same value again - so yes: I want my data to be able to roundtrip

I also like Option <-> Value + null - it's a reasonable standard and things like int option option are not common

for the general case I like those to be converted into a JSON object with an tag/case field given the name of the data-constructor, and the values attached to the case as flat fields - if the values have no names then item1, item2, ... is ok - if there is already a item1 then please throw an exception

btw: the only way there could be a item1 is if the user attaches a member to their DU with getter/setter - IMO you can safely ignore those - all the relevant date in a F# DU should be in it's data-constructors

CarstenKoenig avatar Jun 21 '18 06:06 CarstenKoenig

Also I'm strongly in the lawful camp - so no matter what serializers/deserializers should have some laws attached

I'd agree if the purpose of converting to JSON was, indeed, to enable F# code to communicate with (other) F# code. If you have that need, though, why use JSON? Couldn't you use, say, protocol buffers?

If you convert data to JSON, there must be a reason for choosing this format. I can't think of any other reason than interoperability. At that point, one should follow the paradigm of JSON, not of of F#.

Putting information about DU cases into the JSON is to leak implementation details.

I know that they're no longer cool, but there was much (forgotten) wisdom in Don Box's old tenets of SOA, particularly the one that says that services share schema and contract, not class. The point is that one can't assume that consuming software can make use of what's essentially an implementation detail.

ploeh avatar Jun 21 '18 06:06 ploeh

not sure if it's a good idea to discuss here but so be it

Also I'm strongly in the lawful camp - so no matter what serializers/deserializers should have some laws attached

I'd agree if the purpose of converting to JSON was, indeed, to enable F# code to communicate with (other) F# code. If you have that need, though, why use JSON? Couldn't you use, say, protocol buffers?

it's more about Backend <-> Frontend - I think JSON is quite a nice format to do that

If you convert data to JSON, there must be a reason for choosing this format. I can't think of any other reason than interoperability. At that point, one should follow the paradigm of JSON, not of of F#.

I've heard this before and yes it's a valid stance - still it's not to hard to JSON-encode DUs (which is usually the only problem) with something like a tag

And if you use languages with DUs both on the backend and on the front-end it seems a bit silly to write code to deal with a intermediate representation because JSON is not really a good fit for DUs - at least for me

I can see where you are coming from, but to be honest: you can opt out of this any time you want, so you lose nothing - on the other hand people who just want this would win a lot

CarstenKoenig avatar Jun 21 '18 09:06 CarstenKoenig

I'm very much in favour of the point @CarstenKoenig made which is object |> serialize |> deserialize should be lossless.

I think one of the most common examples is interop with a javascript client. There are also many datastores which may mean JSON storage is the best option (I'm using Greg Young's event store)

Here's one example:

type CloseDetail = { Reason: string }
type Close = 
    | Approve of CloseDetail
    | Reject of CloseDetail

Resulting in the following JSON

{
    "approve": { "reason": "This solves the issue." }
}

I would say this JSON is idiomatic and usage with javascript is as follows

if (model.approve) {
    // Maybe expect a property that's not there on reject
    // Or process the data differently based on the property present
}

daniellittledev avatar Jun 21 '18 09:06 daniellittledev

We use an identical custom converter in our codebase, so would definitely like to see it being officially supported :+1: I very much like the choice of $case as well.

eiriktsarpalis avatar Jun 21 '18 22:06 eiriktsarpalis

I know that they're no longer cool, but there was much (forgotten) wisdom in Don Box's old tenets of SOA, particularly the one that says that services share schema and contract, not class. The point is that one can't assume that consuming software can make use of what's essentially an implementation detail.

What if the consumer of the JSON is not F# but another functional programming language that also has discriminated unions? Would those details be actually implementation-details after all?

knocte avatar Jun 23 '18 10:06 knocte

What if the consumer of the JSON is not F# but another functional programming language that also has discriminated unions?

If you know for certain what's going to be on both sides of a serialised format, then why pick JSON?

ploeh avatar Jun 23 '18 10:06 ploeh

If you know for certain what's going to be on both sides of a serialised format, then why pick JSON?

For example, because I'm storing the serialised data in a database that has special support for JSON (PostgreSQL, or many NoSQL databases), and which allows me to do other stuff with it in addition to deserialization.

But I 100% agree with your broader point: serializing data for myself and for others are two completely different scenarios.

When I'm the only consumer of the serialized data:

  • Round tripping is absolutely essential. No round-tripping = no can use.

  • The second-most important thing is minimizing breaking changes as I expand and refactor my model. Sure, I can keep previous versions of my model in the code and convert them manually, but the less often I need to do that the better.

    Ideally, the deserializer would automatically handle unambiguous changes such as adding extra cases, naming previously unnamed fields, adding optional fields, or changing a field of type 'T to 'T option (maybe even T seq / T array / etc., but that's more questionable)

  • The serialisation format itself isn't very important. It might make my usage of the data for non-serialization purposes (such as JSON queries) more awkward, but I can probably deal with that.

On the other side, when I'm publishing my serialized data to third parties:

  • Round tripping matters very little. It can be nice to reuse the same DTO for GET and POST, but you often don't want to do that, and if you do it's not a big deal to have to write two DTOs.

  • Avoiding breaking changes is also, paradoxically, much less important, because when the semantics of my model grow I'm going to have to version my API anyway.

  • The serialisation format, now, becomes critical. If I expose an object that just looks like a pile of random arrays, that's awful UX for my consumers.

    As a rule, if I'm writing a web API (which I expect will be 99% of these use cases), then I want something that Swagger/OpenAPI can document, and that NSwag or Swashbuckle can understand and generate.

    This is where, I believe, tooling efforts should focus. Right now, I usually convert my F# models into DTOs consisting of plain records of primitive types, in order to minimize any runtime surprises on the wire. The more F# features that get reliably converted into plain, idiomatic, documentable JSON, then the fewer such DTOs I need to write.

So, with those goals in mind, what is the best thing that JSON.NET could emit for DUs? I believe it would be to emit a format compatible with OpenAPI's oneOf keyword.

oneOf accurately reflects the meaning of sum types in F#, and any consumer that can read an OpenAPI 3.0 file would get the same information as a native F# consumer.

The current format, unfortunately, isn't compatible because it's not possible to encode arrays of heterogeneous objects in OpenAPI; but many of the formats proposed in this thread are compatible, and I would ask you to pick one.

If it ships with that feature by default, then NSwag and Swashbuckle can start generating OpenAPI schemas to match, and then I can start using DUs in my public DTOs - which will often mean not having to write a DTO at all!

piaste avatar Jun 23 '18 17:06 piaste