rescript-compiler [Feature request] zero-cost binding to tagged JS objects

I explored the various ASTs on astexplorer.net and saw many possibilities. It would be fantastic to use pattern matching when dealing with trees parsed by parser libraries

But in my understanding, that requires converting JS objects to a record and is converted to a ReScript internal representation at runtime.

this feels unnecessary overhead when writing some bindings for a parser library. Because each item in the tree is already a tagged object.

Imagine a function like this:

@deriving(tagged)
type rec node =
  | _Text({
      @tag("#text") nodeName: string,
      value: string,
    })
  | H1({
      @tag("h1") nodeName: string,
      childNodes: array<node>,
    })

@val external nodes: nodes = "nodes"

let rec toText = nodes =>
  nodes->Belt.Array.reduce((text, node) => text ++ " " ++ switch node {
    | _Text({ value }) => value
    | H1({ childNodes }) => toText(childNodes)
  }, "")

nodes->toText // works without additional parsing

And its output, instead of TAG, we can match via a tag we specify.

var match = node;

var tmp;

switch(match.nodeName) {
  case '#text':
    // ...

I think this can be a more ergonomic approach when writing bindings for parsers.

Currently, I have to rely on a 3rd party ppx like decco for this kind of work. Or please let me know if there is a better way I am not aware of

Jun 29 '21 18:06 cometkim

This would be useful in other situations too, I jump through a lot of hoops to bind to Slate's operation set using string matching and %identity externals.

Jul 01 '21 00:07 TheSpyder

@TheSpyder the link is broken

Jul 01 '21 08:07 bobzhang

I wasn’t linking to my bindings, those aren’t public (yet). The link is to the source types I’m binding to. The specific line that defines the type is here but it needs context of the rest of the file: https://github.com/ianstormtaylor/slate/blob/4945a1a27505f59805bbbb630d8e22e47b1f29e5/packages/slate/src/interfaces/operation.ts#L138

The operations are 9 objects that use a shared type field as a tag. Some have overlapping fields but that’s the only unique one across all operations and I use it to direct operations to one of 9 identity functions. This lets me wrap the value in a variant (thus adding a second layer of tagging at runtime).

Jul 01 '21 08:07 TheSpyder

This type of functionality would be incredibly valuable indeed. The type of structure described is used a lot in JS-land, and hassle-free bindings to those types of structures would open up interesting possibilities with ReScript (like dealing with ASTs which ReScript in theory is very very good at, but that's painful/close to impossible to do in a sane way now that every single variant needs to be manually mapped at runtime).

I wonder though if it'd be better modelled as a polyvariant? Since Flow/TS models these types of things structurally, I think it'd be valuable to model at least the tag itself structurally with a polyvariant, rather than with a normal variant.

Jul 02 '21 06:07 zth

It looks like polymorphic variants right now are translated to almost what you'd expect:

let a = #hello({ "world": 1 })

var a = {
  NAME: "hello",
  VAL: { world: 1 }
};

To the point that providing a stable encoding (possibly via an @tagged annotation somewhere) where the contents are inlined in the object, and the discriminating field name is specified upfront, could be enough to start exploring this:

let a = @tagged("type") #hello({ "world": 1 })

var a = {
  type: "hello",
  world: 1
}

Or for the long form:

@tagged("type")
type ast = [
  | #hello({ "hello": string })
]

Of course this means that @tagged polymorphic variants without arguments will still have the shape { type: "name" }, but people are doing that on the other side of the type-system already.

From similar work I did on Caramel, (where #hello(1) becomes {hello, 1} in Erlang/Elixir), using polymorphic variants makes this rather natural. Also not a TypeScript user these days but I could see how writing bindings with this could be easier.

Jul 28 '21 11:07 leostera

Seems a must have ! I also though we should cover other JSON shape with variant, in a more general way. Some rough ideas about it, just writing them for the sake of it:

we can also have control on the NAME/VAL keys with an other attribute, e.g:

let a = @keys(["id","content"]) #hello({ "world": 1 })

var a = {
  id: "hello",
  content: { "world": 1}
}

where that @keys attribute have a tuple as argument to define the key. Doesn't seems really helpfull though, your approach seems better.

In the other case we have full control other the JSON shape (e.g: request from an API we have), we can also use non-polymorphic parametric variant for tinier payload and I assume more efficient pattern matching, but still define keys for clarity. It will be a bit more complex if we expect to manage different parameters shapes . The current behavior of regular parametric variant is:

type b = | Hello(string) | Foo(int, int) 
let b1 = Hello("world")
let b2 = Foo(1, 42)

var b1 = {
  TAG: /* Hello */0,
  _0: "world"
};

var b2 = {
  TAG: /* Foo */1,
  _0: 1,
  _1: 42
};

We can have the same attribute to change keys, if we have different shapes:

type b = @tag("id") | @keys(["text"]) Hello(string) | @keys(["n1", "n2"]) Foo(int, int) 
let b1 = Hello("world")
let b2 = Foo(1, 42)

var b1 = {
 id: /* Hello */0,
 text: "world"
};

var b2 = {
 id: /* Foo */1,
 n1: 1,
 n2: 42
};

It will have a similar result than the polymorphic variant with tag, but with a slight performance gain (that I still assume) in exchange of a more complex syntax. We can add some sugar in the case we have the same shape:

type b = @keys(['id', 'content']) | Hello(string) | Foo(int) 
let b1 = Hello("Foo")

var b = {
 id: 0,
 content: "Foo"
}

Again, your idea feel better, even if I would prefer using regular variant.

For some extreme case, we can also imagine that the tagged/keys attribute can be a polymorphic variant for the sake of covering all use case, for example retro-compatibility with a name change on an API.. but things get more complex and probably not that useful, I don't think it worth it.

Aug 04 '21 08:08 kinooyume

We love the syntax proposed by @ostera, but since the current poly vars type cannot contain inline record definitions, I assume there will be some semantic changes for it.

It feels natural to have tag = "TAG" always present in regular variants and to tell the compiler to use a custom tag.

// Assume here is an implicit directive
// @tag("TAG")
type t =
   | Foo({ foo: string })
   | Bar({ bar: int })

Currently, we are saying users "don't rely on the internal representation".

However, It may be better to make the internal representation of a regular variants more predictable rather than treating this as special case.

Pros:

More readable JS output
Reduce the learning curve
Ergonomic bindings

Rust's serde_enum provides a good summary of the predictable representation.

Externally tagged

// TypeScript
type t = (
  | { Foo: { foo: string } }
  | { Bar: { bar: number } }
)

Internally tagged: It is closest to the current behavior. And it's what's known as "Brand" in the TypeScript world.
```
type t = (
  | { TAG: "Foo", foo: string }
  | { TAG: "Bar", bar: string }
)
```

Adjacently tagged: it looks the most flexible.

type t = (
  | { TAG: "Foo", CONTENT: { foo: string } }
  | { TAG: "Bar", CONTENT: { bar: string } }
)

Untagged: N/A

Aug 25 '21 14:08 cometkim

I'm creating a tool to generate ReScript bindings from .d.ts files, and this would be nice to have for consuming TS discriminated unions.

Mar 18 '22 08:03 cannorin

I'm thinking of a PPX syntax that can be introduced without breaking changes.

ex)

@tagged(nodeName)
type rec node =
  | Text({
      nodeName: [#"#text"],
      value: string,
    })
  | H1({
      nodeName: [#"h1"],
      childNodes: array<node>,
    })

Nov 10 '22 23:11 cometkim

I just confirmed this works in ReScript v11

input code:

@tag("nodeName")
type rec node =
  | @as("#text") Text({value: string})
  | @as("h1") H1({childNodes: array<node>})

@val external nodes: array<node> = "nodes"

let rec toText = nodes =>
  nodes->Belt.Array.reduce("", (text, node) =>
    text ++
    " " ++
    switch node {
    | Text({value}) => value
    | H1({childNodes}) => toText(childNodes)
    }
  )

nodes->toText->Js.log

And the output:

import * as Belt_Array from "rescript/lib/es6/belt_Array.js";

function toText(nodes) {
  return Belt_Array.reduce(nodes, "", (function (text, node) {
                var tmp;
                tmp = node.nodeName === "#text" ? node.value : toText(node.childNodes);
                return text + " " + tmp;
              }));
}

console.log(toText(nodes));

export {
  toText ,
}

The result is exactly what I want! Thanks, @cristianoc

I will write some bindings for popular parser libraries and report back if I run into any problems.

Apr 11 '23 17:04 cometkim

The only problem I've noticed so far is that when I wanna reuse a tag name in the code, I have to hardcode it.

Apr 11 '23 18:04 cometkim

I think this can be considered solved via these recently merged features:

Configurable runtime representation of variants
Variant coercion
Variant type spreads

Please feel free to open new issues for anything missing for this workflow to work well.

Jul 16 '23 06:07 zth

Also, if you need a parser library, it'll be well supported in rescript-struct@5 https://github.com/DZakh/rescript-struct/blob/f6dfc93/CHANGELOG_NEXT.md#opt-in-ppx-support

Jul 16 '23 08:07 DZakh

rescript-compiler rescript-compiler copied to clipboard

[Feature request] zero-cost binding to tagged JS objects

rescript-compiler
rescript-compiler copied to clipboard