nickel icon indicating copy to clipboard operation
nickel copied to clipboard

The Syntax Bikeshedding Dojo, round 6: enum types

Open yannham opened this issue 3 years ago • 12 comments
trafficstars

Prior the implementation of RFC002 (see #595), Nickel used the following syntax for enum types: <foo, bar, baz>. Once we merged the term and type syntax, this became ambiguous with the comparison operators. We are currently temporarily using an undocumented syntax, just for the stdlib, but users can't rely on a stable way to write enum types, which is too bad.

What other languages do

I am not aware of that many languages with structural enum types (in Nickel, we don't have to give a name to our enums prior to using them: it's all based on the structure of the enum type, i.e. what are the alternative it permits). C/C++/Rust's enums are nominal, algebraic data types (OCaml, Haskell, Scala, etc.) are nominal as well.

OCaml's polymorphic variants

Ocaml uses the [/] delimiters and | as a separator for its structural enum types (polymorphic variants):

type 'a vlist = [`Nil | `Cons of 'a * 'a vlist]

At first sight, this wouldn't play very well in Nickel (including variants):

  • [`foo | `bar] is already valid syntax: it's a one element list with tag `foo  and attached contract `bar
  • [`foo, `bar] is a two-element list
  • [`foo ; `bar] would clash with the row tail syntax forall a. <foo, bar | a>.

There is a twist though, that could make using [`foo, `bar] possible and consistent. See the proposals below.

Unions: TypeScript, CUE

Typescript (and CUE, for that matter) doesn't have specific enum types, but can emulate them with union types and strings. That is, the old <foo, bar, baz> in Nickel would be written "foo" | "bar" | "baz". This doesn't provide much inspiration, as:

  1. We don't have union types, for a good reason
  2. | and its variations are already taken in Nickel (contracts application, metadata, reverse application)

Dhall's tagged unions

Dhall uses the following syntax for a flavor of union/enum types, which is close to the old Nickel one:

< Number : Natural | Boolean : Bool >

It would have the same ambiguity issue.

That's the ones I know from the top of my head, feel free to add other examples you know.

Proposals

The following is an non exhaustive list of random ideas, don't hesitate to add your own suggestions.

The [| foo, bar |] syntax

This is the temporary syntax currently in use. It reminds the idea of a list, but the | disambiguate the syntax. We could even do [| `foo, `bar |] to make it immediately clear that this is about enums. Note that <|/|> is possible, even if |> is already the reverse app operator (because we would allow only tag labels inside an enum type, so the first |> must be the closing delimiter), but it may be visually confusing to reuse |>.

The list syntax

In Racket, you can use some values, such as numbers or regular expressions, directly as contracts. With that mindset, we could imagine giving a semantics to using a list [exp1, exp2, .., exp3] as a contract: the contract would be that foo passes [exp1, .., expn] iff foo == exp1 || ... || foo == expn. A list contract is thus a simple flavor of union. Doing so, [`foo, `bar] would then just be a particular case recognized by the typechecker. Handling the row tail [`foo, `bar ; a] would be done exactly the same way it is done for record, that is, a tail is syntactically only permitted on a list that is in fact an enum type (all elements are enum tags).

No special syntax

A variation on the last proposal would be to just use contract combinator, either a specific one for enums or a general one for alternatives, that would be special cased by the typechecker:

let foo : contract.enum [`a, b`, `c] = `a in...

or

let foo : contract.one_of [`a, `b, `c]

This is roughly the same idea as the previous list syntax, but requiring an explicit contract combinator. We may need an additional combinator for an enum with a tail.

yannham avatar Mar 23 '22 10:03 yannham

I think I like the [| `foo, `bar |] best right now.

aspiwack avatar Apr 04 '22 17:04 aspiwack

The idea of minting a novel multi-character delimiter is kind of a bummer. I get that [`a, `b, .., `z] is the syntax for lists, but can you just parse the enum declaration as a list of enum tags in the appropriate position (or { `a, `b, .., `z } or whatever)?

ammkrn avatar Apr 06 '22 23:04 ammkrn

but can you just parse the enum declaration as a list of enum tags in the appropriate position

Not really. The point of this discussion is that we have a single grammar for both terms and types. So we can only ever parse a syntax a single way.

Now, as @yannham points out in his the list syntax section, this is not unsolvable. We parse list a single way, but when we need to interpret a list as a type, then we say it must be an enum type. But this approach doesn't really bring me joy.

aspiwack avatar Apr 07 '22 06:04 aspiwack

Not really. The point of this discussion is that we have a single grammar for both terms and types. So we can only ever parse a syntax a single way.

I should have been more clear, I'm curious whether there's any context to an enum declaration that will allow lists to be parsed differently. E.g. in this rust example, the first list literal is parsed as an ArrayExpression, the second one as a SlicePattern even though they're comprised of the same tokens, because the nonterminals Expression and MatchArm parse those tokens differently:

fn example() {
    let x = ['a', 'b', 'c'];
    let y = match x {
        ['a', 'b', 'c'] => 0,
        _ => 1
    };
}

If not, then I guess my preference would be for the explicit contract.enum and either commas or pipes as separators if pipes in this position don't interfere with the contract stuff: contract.enum [`A, `B, `C] or contract.enum [`A | `B | `C].

ammkrn avatar Apr 07 '22 15:04 ammkrn

I should have been more clear, I'm curious whether there's any context to an enum declaration that will allow lists to be parsed differently

In general, no. Basically, both lists and enum types should be allowed to appear in similar positions inside a type/contract annotation:

# accept arrays of size 2 whose elements are taken among set
let PairAmong = fun set =>
  contract.from_predicate (fun value => builtin.is_array value &&
    array.length value == 2 &&
    array.all (fun x => array.elem x set) value) in

# accept null or something satisfying some_contract
let Nullable = fun base_contract label value =>
  if value == null then value
  else contract.apply base_contract label value in

# here, we would like to interpret it as an enum type
`a | Nullable [`a, `b, `c]
# here, we would like to interpret it as a bare list
[`a, `c] | PairAmong [`a, `b, `c]

yannham avatar Apr 07 '22 16:04 yannham

Thanks for clearing that up. In that case, my preference would be the explicit contract combinator.

ammkrn avatar Apr 08 '22 15:04 ammkrn

e.g. <None : "foo" without space between the < and the symbol would not be a possibility? afaiu if you are somewhat whitespace-sensitive (e.g. for comparisons you expect whitespace between the expressions and the operator), you still have the no-whitespace-in-between version of a character.

Of course that doesn’t help with the syntax to use for the sum type description…

Maybe <Foo | <Bar | <Baz. Or backticks instead of < if you really like them.

Profpatsch avatar Apr 09 '22 12:04 Profpatsch

Maybe I’m understanding this wrong and you need a wrapping [| and |] for it to be unambiguous. But then why not something like <<.

Profpatsch avatar Apr 09 '22 12:04 Profpatsch

What I want to say is that a two-character separator is harder to type unless it uses the same symbol multiple times. [| means I have to type two different special symbols in the right order (and then again in a different order!) vs << where the mental and physical overhead is obviously a lot lower.

Profpatsch avatar Apr 09 '22 12:04 Profpatsch

Whatever other syntax you decide on, please, don't use backticks as part of the language. It makes inline quotes in markdown so much more annoying (just have a look at the source of doc/manual/syntax.md). Maybe using ' like Rust does for labels and lifetimes is a valid alternative

piegamesde avatar Jun 04 '22 08:06 piegamesde

Since we don't have character literals, and don't plan to have them, I think I quite lite the suggestion of replacing ` with '. It's a breaking change but is cheaper to do at this point than later in time. I also think keeping it in the type is a nice and immediate visual clue that we are talking about enum tags (that is, if we choose e.g. [|/|], then we should have [| 'foo, 'bar, 'baz |] or [| 'foo | 'bar | 'baz |] instead of [| foo, bar, baz |]).

e.g. <None : "foo" without space between the < and the symbol would not be a possibility? afaiu if you are somewhat whitespace-sensitive (e.g. for comparisons you expect whitespace between the expressions and the operator), you still have the no-whitespace-in-between version of a character.

This makes sense syntactically, but that may be surprising that enum types delimiters are the only delimiters that are whitespace sensitive: that is, you can write either {foo = 1} or { foo = 1 }, {foo: Num} or { foo : Num }, and [1,2] or [ 1, 2 ], but <None ... works while < None ... breaks, in possibly strange ways, depending on the parsing of the latter.

What I want to say is that a two-character separator is harder to type unless it uses the same symbol multiple times. [| means I have to type two different special symbols in the right order (and then again in a different order!) vs << where the mental and physical overhead is obviously a lot lower.

IMHO, It's honestly not a very important criterion. I think what's important is that the syntax is readable and you can quickly identify the nature of the expressions at a glance, that it is consistent with the rest of the language (e.g. don't use braces somewhere for records and somewhere else for lists), and if possible, consistent with what people are used to in general (e.g. don't use { for lists and [ for records). Then, of course, if all those constraints are fulfilled, we can choose something that is nicer to type among the candidates. I guess <</>> work too, indeed.

Right now, I think [| 'foo, 'bar, 'baz |] or [| 'foo | 'bar | 'baz |] fits the bill (the last one is overloading once more the pipe |, but on the other hand is also almost universally used elsewhere for disjunction). Note that the closing delimiter needs not be duplicated, if we like it asymmetric : [| 'foo | bar | baz ], but it might be easier to confuse with a list.

yannham avatar Jul 25 '22 13:07 yannham

For 0.2.0, let's go with [|/|], commas (more consistent with other delimiters, won't clash if we add say ADT which could potentially have metadata or default values inside, etc.), and keep backtick for enum tags `. This is backward compatible for enum tags and doesn't prevent us from switching to using a single quote (or something else that doesn't clash with markdown) in place of the backtick. But let's postpone this discussion to 0.3.0.

yannham avatar Jul 28 '22 16:07 yannham

We are now at version 0.3.1 and those horrendous backticks are still there :/

If you care about backwards compatibility, waiting and delaying the change won't make things any better. (Also: would it be possible to allow both for some transition period?)

piegamesde avatar Apr 22 '23 20:04 piegamesde

We are now at version 0.3.1 and those horrendous backticks are still there :/

This is not very nice to backticks :cry:

On a more serious note, thanks for bringing this up again. It's in fact a good timing, as we are preparing for the 1.0 release. The original idea was just that we were missing more feedback/input on that, and we didn't want to block on that to release 0.2 (blocking on bikeshedding, unless you can't avoid it, is often a bad idea).

We are already breaking so much syntax between 0.3.2 and 1.0 that I don't think a transition period makes sense. We try to fix and avoid any kind of syntactic debt, even if it means changing a lot of stuff. Because backticks are used only for enum tags, the fix should be a pretty blind regexp search and replace (beside Markdown document). I would also like it better if users to be somehow forced to use the 1.0 syntax directly rather than the possibility of keeping the old one and just get annoyed more later when it actually becomes deprecated. The single quote proposal seems to be consensual, so there's a good chance we go with it real soon.

yannham avatar Apr 24 '23 13:04 yannham

Sligthly off-topic, but is there an overview of the planned syntax changes towards 1.0?

piegamesde avatar Apr 24 '23 13:04 piegamesde

There are some in RELEASES.md, but it's still incomplete. This document will be completed by the 1.0 release of course.

We've eliminated the need for a trailing m when closing a multiline string: m%"Foo Bar"% instead of m%"Foo Bar"%m. Since RFC005, metadata are more restricted than before (besides contract and type annotation, all the other metadata can't be attached to any expression as before but must be attached to a record field).

Otherwise, most of them are about naming of builtin types or stdlib functions: we've switched to long name for builtin types (Number, String, etc.), same for stdlib modules, and the latter have been put under a std namespace (so, std.array instead of array). We've also renamed a lot of stdlib function for more consistency, better discoverability and trying to get a bit out of the functional programming jargon (such as array.first instead of array.head, etc.). I might be forgetting some.

yannham avatar Apr 24 '23 15:04 yannham

The enum tag single quote change has been implemented, and given the initial scope of this issue, it can be considered as completed. Feel free to re-open one for further bikeshedding.

yannham avatar Apr 27 '23 09:04 yannham