dhall-haskell icon indicating copy to clipboard operation
dhall-haskell copied to clipboard

Type discriminator field in json from union

Open cfilipov opened this issue 6 years ago • 36 comments

I'm trying to generate some json that looks like this:

{
  "shapes": [
    {
      "type": "rectangle",
      "length": 1,
      "width": 5
    },
    {
      "type": "circle",
      "radius": 6
    }
  ],
  "title": "My Shapes"
}

It seems natural to model this with unions:

let Shape = 
    < Rectabgle: { length: Double, width: Double }
    | Circle: { radius: Double }
    >

let Geometry: Type =
    { title: Text
    , shapes: List Shape
    }

in {  title = "My Shapes"
    , shapes =
        [ Shape.Rectabgle { length = 1.0, width = 5.0 }
        , Shape.Circle { radius = 6.0 }
        ]
    } : Geometry

However, this does not include the type discriminator field:

{
  "shapes": [
    {
      "length": 1,
      "width": 5
    },
    {
      "radius": 6
    }
  ],
  "title": "My Shapes"
}

How would I achieve this?

cfilipov avatar Oct 04 '19 05:10 cfilipov

I've transferred this issue to this repo, since this one contains the dhall-to-json tool.

You can configure the encoding of unions with the Nesting type documented in Dhall.JSON.

Please ask further, if the documentation doesn't answer your questions!

sjakobi avatar Oct 04 '19 05:10 sjakobi

I had to try this for myself and ended up with this:

let Shape =
      < rectangle : { length : Double, width : Double }
      | circle : { radius : Double }
      >

let Geometry : Type = { title : Text, shapes : List Shape }

let geometry
    : Geometry
    = { title = "My Shapes"
      , shapes =
          [ Shape.rectangle { length = 1.0, width = 5.0 }
          , Shape.circle { radius = 6.0 }
          ]
      }

let geometryToJSON =
      let Nesting = https://prelude.dhall-lang.org/v10.0.0/JSON/Nesting
      
      let shapeToJSON =
              λ(s : Shape)
            → { nesting = Nesting.Inline, field = "type", contents = s }
      
      in    λ(g : Geometry)
          →   g
            ⫽ { shapes =
                  https://prelude.dhall-lang.org/v10.0.0/List/map
                    Shape
                    { nesting : Nesting, field : Text, contents : Shape }
                    shapeToJSON
                    g.shapes
              }

in  geometryToJSON geometry

sjakobi avatar Oct 04 '19 06:10 sjakobi

Thank you!

Wow, that's a lot of work! I'm imagining what this would look like with a data model that makes heavy use of this pattern. Wish there was a more ergonomic way to do this.

I'm thinking something like typescript's string literal type for discriminated unions would make this more explicit and less magical:

let Shape =
      < rectangle : { length : Double, width : Double, type: "rectangle" }
      | circle : { radius : Double, type: "circle" }
      >

cfilipov avatar Oct 04 '19 07:10 cfilipov

@cfilipov: What TypeScript calls string literal types Dhall calls enums (or unions with empty alternatives)

For example, this TypeScript type:

type Easing = "ease-in" | "ease-out" | "ease-in-out";

... corresponds to this Dhall type:

let  Easing = < ease-in | ease-out | ease-in-out >

... but in this case there is no need to do that because you already have a union with alternatives that have the right name.

I think the right solution here is perhaps to provide support for controlling nesting behavior globally (i.e. via a command line switch)

Gabriella439 avatar Oct 05 '19 17:10 Gabriella439

I think the right solution here is perhaps to provide support for controlling nesting behavior globally (i.e. via a command line switch)

That seems like a good idea. We could have alternative options --nesting-inline and --nesting-nested FIELDNAME to control the nesting and an option --tag-field FIELDNAME to control the fieldname for the tag. E.g.

$ dhall-to-json --nesting-nested=value --tag-field=name <<< "< Left : Natural | Right : Natural>.Left 2"
{
  "name": "Left",
  "value": {
    "foo": 2
  }
}

sjakobi avatar Oct 05 '19 17:10 sjakobi

Personally I'm not a fan of this being a command line flag. While I feel the current syntax is a bit heavy and "magical" I prefer over a flag.

The reason is, a command line flag is is out of band and it seems wrong to have two different results depending on how you run the program.

cfilipov avatar Oct 07 '19 18:10 cfilipov

@cfilipov I think we can implement this in a way where the current way of specifying the nesting in the code continues to work.

I think the flags could be useful in any case, if only so you can quickly see what the various nesting options look like.

sjakobi avatar Oct 07 '19 18:10 sjakobi

@Gabriel439 You mention that dhall union is essentially the same as what typescript calls string literal types and I see that from your example, but I think there is more to it than that.

(Aside: In fact, in typescript you can also have numerical literals. Essentially, it seems like you can specify a specific value of a field in a record's type (or it's just a type that only has one value), not sure what you call that formally.)

The important difference is that typescript allows me to to use type discriminator in a more ad-hoc way, (which feels more ergonomic for that that's worth).

For example, here's a definition in typescript:

type Circle = {
    type: "circle";
    radius: number;
}

To do this in dhall I would have to do it this way from what I can tell:

let CircleType = < Circle >
let Circle = {
    radius : Double,
    type: CircleType
}

So in the end, my preferred way to get the result I wanted from my original question is this:

let RectangleType = < Rectangle >
let CircleType = < Circle >

let Shape =
      < Rectangle :
          { length : Double, width : Double, type: RectangleType }
      | Circle :
          { radius : Double, type: CircleType }
      >

let Geometry = { title : Text, shapes : List Shape }

let geometry
    : Geometry
    = { title =
          "My Shapes"
      , shapes =
          [ Shape.Rectangle { length = 1.0, width = 5.0, type = RectangleType.Rectangle }
          , Shape.Circle { radius = 6.0, type = CircleType.Circle }
          ]
      }

in geometry

This might not be the idiomatic way to do this in dhall, but this feel right to me. This is more explicit than the command line flag, you can tell by reading the code what the output would look like. It's also a lot less code than the solution @sjakobi proposed. To me this is more understandable and doesn't invoke the magic incantation of enclosing some special type. Is there any reason I shouldn't do it this way?

cfilipov avatar Oct 07 '19 20:10 cfilipov

@cfilipov That seems totally reasonable if you don't mind duplicating the differentiation between rectangles and circles.

sjakobi avatar Oct 07 '19 21:10 sjakobi

Personally, I would vastly prefer the command-line switch, as the other options unnecessarily clutter one's actual configuration files. If you're purely using dhall to generate JSON and not "natively," then I suppose it's okay to to have to jump through hoops to produce the JSON one wants. But for the case of using dhall itself, with JSON only as a fallback for endpoints/clients that lack a dhall library, it's just messy to have to include two useless fields (and have the whole thing wrapped in an unnecessary record) just on the off-chance that one might need to access the files via a JSON intermediate at some point. Command-line switches represent an ideal fall-back for just such cases.

The reason is, a command line flag is is out of band and it seems wrong to have two different results depending on how you run the program.

As far as the above concern mentioned by @cfilipov of having two different results depending on how the program is run, dhall-to-json already has multiple flags that result in different output, e.g. --noMaps, --omitEmpty, etc., all of which alter the JSON output based on a command-line flag. Unless I'm missing something specific in this case, which is entirely possible?

SiriusStarr avatar Oct 12 '19 04:10 SiriusStarr

It’s usually a bad idea to make the semantics of source change if it’s invoked differently.

Profpatsch avatar Oct 12 '19 15:10 Profpatsch

@Profpatsch: It's not that bad, for two reasons:

  • It's actually not a change to the Dhall semantics. In other words, the Dhall interpretation passes (import-resolution/type-checking/evaluation) are unaffected. This only affects the final conversion from Dhall to JSON

  • We're slowly gravitating to making certain options standard instead of configurable as we figure out best practices. For example, --omitNull recently switched to becoming the default and may become standard later in the future

Gabriella439 avatar Oct 12 '19 15:10 Gabriella439

It's actually not a change to the Dhall semantics. In other words, the Dhall interpretation passes (import-resolution/type-checking/evaluation) are unaffected. This only affects the final conversion from Dhall to JSON

I think this is important. Dhall fundamentally supports things that JSON does not, so it is necessarily a translation from dhall to JSON, not a transcription. Like any translation, there are going to be choices made in how to represent "foreign" constructs in the destination language. It seems like these are best dealt with by instructing (via flags) the translator (dhall-to-json) how to go about it, rather than rewriting the source text itself, since that information is simply not relevant to the source.

Long story short, the fact that JSON lacks union types is a shortcoming of JSON, not a shortcoming of Dhall; the choice of how to overcome that shortcoming should be made only when one finds it necessary to switch to the deficient representation rather than baggage toted around in the language that supports it (which then clutters up dealing with the data in dhall itself).

SiriusStarr avatar Oct 12 '19 18:10 SiriusStarr

Do we agree then that something like the --nesting-{inline,nested} and --tag-field options I proposed in https://github.com/dhall-lang/dhall-haskell/issues/1383#issuecomment-538670891 is the way to go?

If we want to move away from defining union constructor translation via the Nesting type, we should maybe find better names for these options…

sjakobi avatar Oct 18 '19 19:10 sjakobi

One shortcoming of a command line switch is that it either enables or disables the tag field/nesting for all occurrences. I have definitely run across JSON where both styles are used. But of course the aforementioned alternative approaches can be used.

cfilipov avatar Oct 22 '19 00:10 cfilipov

@sjakobi: I think we should support the global command-line options and also preserve the existing functionality, too

The general rule of thumb I use for backwards compatibility is that it's cheap to support if it's not part of the standard (i.e. it's specific to the Haskell implementation) since other/new implementations don't have to support it

Gabriella439 avatar Oct 30 '19 00:10 Gabriella439

Would the option names --union-nesting-{inline,nested} and --union-tag-field instead of just --nesting-inline and --tag-field be better? They'd be slightly clearer IMHO.

sjakobi avatar Nov 07 '19 12:11 sjakobi

@sjakobi: Yeah, that seems reasonable to me

Gabriella439 avatar Nov 07 '19 16:11 Gabriella439

I'm working on implementing this as another Expr-to-Expr pass, along the lines of convertToHomogenousMaps.

Now I'm wondering what to do with unions that contain non-record alternatives when the --union-tag-inline option is enabled, for example

let Example = < A | B : { b : Text } | C : Natural >

dhall-to-json currently translates an inline-tagged Example.A to { "<field>": "A" } and an (Example.B { b = "foo" }) to { "<field>": "B", "b": "foo" }.

In the case of (Example.C 42) it throws an error, because it has no field name that it could use for the 42.

For the new pass, I see these options:

  1. Check the union types and error out if we find any non-record alternative.

  2. Check the union types, and refuse to tag any Example.

    [ Example.A, Example.B { b = "foo" }, Example.C 42 ]
    =>
    [ "A", { "b": "foo" }, 42 ]
    
  3. Tag Example.A and Example.B, but keep Example.C as a union value:

    [ Example.A, Example.B { b = "foo" }, Example.C 42 ]
    =>
    [ { "<field>": "A" }, { "<field>": "B", "b": "foo" }, 42 ]
    

    A downside to this approach that the result of the pass is ill-typed. This shouldn't cause any problems right now, but might make it harder to implement subsequent changes down the line. (I believe convertToHomogenousMaps can produce ill-typed terms too.)

  4. Just tag everything, and let any Example.C get caught later on when converting to JSON. This is very simple to implement, but the resulting errors might be a bit confusing.

  5. Tag any Example.A and Example.B, but error out in the case of Example.C. The error messages should be slightly better than with (4).

To sidestep the issue, we could alternatively make a new rule that creates a field name for Example.C. We could default to the constructor name ("C") and add an option to override this default…

Thoughts?

sjakobi avatar Nov 23 '19 20:11 sjakobi

@sjakobi: I would go with (1)

My reasoning is that:

  • It catches potential errors earlier
  • We can easily change the behavior later to be more lenient upon user request

Gabriella439 avatar Nov 23 '19 20:11 Gabriella439

@Gabriel439 That makes sense to me.

I wonder whether I should implement the union type check in dhallToJSON, so the check also applies when the new nesting options are not enabled. That would change the behaviour for code that uses Nesting.Inline directly.

sjakobi avatar Nov 23 '19 21:11 sjakobi

@sjakobi: Is it possible to apply the check at the point where the conversion takes place? (i.e. for the conversion function to return an Either)?

Gabriella439 avatar Nov 23 '19 21:11 Gabriella439

Do you mean the Dhall-to-JSON conversion or the union-to-"tagged"-record conversion?

dhallToJSON already returns an Either. The new pass will return an Either depending on whether it handles the union type check itself – alternatively it can defer the check to dhallToJSON.

sjakobi avatar Nov 23 '19 21:11 sjakobi

@sjakobi: I mean the union-to-tagged-record conversion

Gabriella439 avatar Nov 23 '19 21:11 Gabriella439

Ok. Then we will keep the current behaviour where a manually tagged Example.A or Example.B { b = "foo" } are converted to JSON for now. It seems a bit inconsistent but that's a separate issue.

sjakobi avatar Nov 23 '19 22:11 sjakobi

Isn't option 1 just going to force a return to pre-v7 unions, where everything just had to be tagged with empty records? Or am I not understanding something?

SiriusStarr avatar Nov 24 '19 08:11 SiriusStarr

Sorry, I should probably have been a bit more precise in my description of option (1): We'll error out if there are any non-empty non-record alternatives.

So < Foo >.Foo is fine (and encoded as {"<tagField>": "Foo"}, and < Foo : { bar : Bool } > is also fine. But < Foo : Bool >.Foo True is a problem because there's no "natural" field name for the Bool. It's unclear how to encode it: {"<tagField>": Foo", ???: true}

If users run into the error, I think they will just wrap a singleton record around the alternative.

sjakobi avatar Nov 24 '19 08:11 sjakobi

Gotcha, but it wouldn't error on a union like

let relationshipStatus = < Single 
                         | Married : Person 
                         | Divorced 
                         | Widowed 
                         | Engaged : Person >

(where Person is some record) you're saying. That seems fine to me; I was being dumb and misinterpreting, haha!

SiriusStarr avatar Nov 24 '19 19:11 SiriusStarr

Yep, you understood that correctly! I was also surprised by this little complexity.

sjakobi avatar Nov 25 '19 07:11 sjakobi

I just wanted to chip in my two cents.

The issue of encoding non-empty non-record alternatives could be side-stepped by changing how alternatives are encoded. Rather than having a tagging property on each alternative, you could encode each alternative as a record with a single property. So for instance, I imagine that values of type Example = < A | B : { b : Text } | C : Natural > would be encoded like:

  • Example.A to { "A": null }
  • Example.B { b = "foo" } to { "B": { "b": "foo" } }
  • Example.C 42 to { "C": 42 }

Incidentally this is similar to how Aeson generically encodes objects with the ObjectWithSingleField option.

One effect of this, obviously, is that the structure of the JSON you generate will be deeper. This may or may not be desirable.

My other idea was regarding the discussion of the command line flag affecting translation globally. It might be desirable to work around this by allowing pragmas in Dhall. I don't know if there's precedence for this in Dhall. I envisage that these pragmas could be annotated inline in the Dhall source file, or placed in separate files. Example:

{-# SumEncoding Example ObjectWithSingleField #-}
let Example = < A | B : { b : Text } | C : Natural >

It would also be possible to specify this per-type on the command line with e.g. --sum-encoding Example=ObjectWithSingleField. I'm not sure how nice of a user-experience that would be, though.

fredefox avatar Feb 21 '20 08:02 fredefox