RFCs
RFCs copied to clipboard
Multiple usages of the case discriminator for case objects
This RFC is a summary of the discussion from https://github.com/nim-lang/RFCs/issues/19
The proposal that has won is to allow repeating the case discriminator field without the : type syntax. Like so:
type
NodeKind = enum
foo
bar
bif
baz
bad
Node = object
case kind: NodeKind
of foo, bar, baz:
children: seq[Node]
of bad, bif:
dad: Node
case kind
of foo:
additionalFieldForFoo: int
of bar:
onlyValidForBar: float
else: discard
case kind
of foo, bar:
validForFooAndBar: string
else: discard
It's by far the simplest and most elegant extension to the existing language. It's 100% backwards compatible since this code doesn't compile yet. The ABI is clear too, the re-used field is put only once into the object, at the same offset that it used to.
To do
Things left to specify: How the AST looks like for this construct, both before sem'checking and after sem'checking.
Apart from the fact I personally find this quite ugly and hard to read: I was hoping that Nim leaves the door open for having real algebraic data types eventually, like Haskell or Rust enums. For instance via Node = object {.algebraic.}, allowing a field x having different types in different branches, and field access fully type checked via type guards. I feel with this syntax choice instead of repeated fields this goes out the window.
I feel with this syntax choice instead of repeated fields this goes out the window.
Huh? Why? We can always require obj.field to be used inside a case guard, nothing changed. In fact, there is a warning for this via warning[ProveField]:on already.
Huh? Why? We can always require obj.field to be used inside a case guard, nothing changed.
On usage site yes, but the grouping based approach doesn't seem to extend nicely towards heterogeneously typed fields. The type declaration would have to support something like:
type
Node = object
case kind: NodeKind
of foo:
children: seq[Node]
of bar:
children: seq[SealedNode]
I.e., it would require extension towards the first proposal in https://github.com/nim-lang/RFCs/issues/19 anyway. If that would be supported as well there would be two ways of writing the case where a field has the same type, either grouped or repeated, which seems weird.
In Rust I don't mind the repetition of fields in branches at all. Not only is a rare issue to have significant repetition, but the explicitness also helps to see all fields of a branch with a single glance, whereas the proposal requires actual reading to mentally assemble a foo vs a bar etc.
I think heterogeneously typed fields are so rare that it is not worth musing about. Just use different names for different things.
They are not so rare in my opinion. When implementing e.g. a wrapper that is supposed to be bounded in types it makes sense to call its main data field just value or data, without having to creatively come up with a different unique name like valueArrayOfString in every branch.
I also find the feature helpful when implementing file formats that involve heterogeneous fields by the specs. Comparing implementing file formats in both Nim and Rust, I enjoyed Rust's enums a lot, because I could use the official spec field names despite incompatible types:
pub enum FileChunk {
FooChunk { id: u16, header: FooHeaderType },
BarChunk { id: u64, header: BarHeaderType },
RomVersion { major: u16, minor: u16 },
InstrumentVersion { major: u32, minor: u32 },
// ...
}
Just use different names for different things.
They aren't necessarily different things semantically. A wrapper holds a "value", an "id" is a id, and a version has a "major" and a "minor" field, no matter what exact type is used. The elegance of algebraic data types is that fields can express whatever they want, without having to worry about conflicts or ugly type suffixes like id16 and id64.
Another example: Instruction opcodes. Take this Rust implementation of GameBoy opcodes as an example. It is straightforward from looking at the code what the individual opcodes require. This kind of representation makes it even possible to guess what they do. With grouping by fields this would become an absolute mess in my opinion. Admittedly in this specific example it would still be acceptable, because there are many unary (and few binary) opcodes, and sharing a field name would be still be relatively straightforward for them. But imagine many opcodes requires like 5 operands or so. What a headache to search more meaningful field intersections / unions and picking distinct names...
because I could use the official spec field names despite incompatible types
But only if you're lucky and the spec uses names that are not alien to your programming environment. Usually it's better to stick with the programming language's conventions.
On top of that, usually a spec is concerned with packed data and programming languages have little support for variable sized packed memory structures (yes, this includes holy low level C) so you need a DSL or some custom code anyway.
So it won't be possible to define a shared field with another subset of tags?
E.g. to add to object in the example above:
case kind
of foo, bad:
commonForFooAndBad: int
else: discard
?
@a-mr Sure it would be possible, that's the point of this RFC... I don't understand your question.
@Araq , you wrote your example as if all shared fields are placed in first case and second case without : type contains only non-shared fields. Hence the confusion.
Thanks, I've updated the example to make it clearer, I hope.
@timotheecour why the downvote?
because I much prefer your original proposal in https://github.com/nim-lang/RFCs/issues/19#issuecomment-173529871
this new RFC prompted me to go ahead and implement your original proposal, I have a PR in the works for that
rationale:
- simpler implementation in compiler, and probably also for macro writers consuming case statement object AST's
- more intuitive (subjective) and more DRY (objective)
- easier to control the field ordering layout compared to if it were done with this RFC
- it opens possibility of also extend existing
casestatements in a similar way in future work:
case {.allowDups.} t.typ
of routineKinds: ... # includes skTemplate
of skTemplate: ... # extra code
...
else: discard
note
i don't understand this objection though https://github.com/nim-lang/RFCs/issues/368#issuecomment-820701871 (heterogenous type fields); i don't see how this can work (FieldDefect is checked at RT, not CT). So shared fields must have same type.
i don't understand this objection though #368 (comment) (heterogenous type fields); i don't see how this can work (FieldDefect is checked at RT, not CT).
My hope simply was that one day the check would become an (opt-in) CT check to make case object fully type safe, instead of relying on RT checks. I.e.:
type
WrapperKind = enum
Str
Num
# Perhaps also {.typesafe.}
Wrapper {.algebraic.} = object
case kind: WrapperKind
of Str:
value: string
of Num:
value: int
The "algebraic" opts in to enforce all access to this case object behind type guards only. This enables the reuse of value with heterogeneous types in different branches. Omitting the {.algebraic.} would result in a compiler error like:
Field "value" has different types in different case branches. Only {.algebraic.} case objects can have heterogeneous types.
Extension to the proposal: allow if (in addition to case) for object variants
Rationale:
- avoids
else: discardstatements - more consistent: if we have
casethen why not haveif? Alsoifbranches will look the same for both usage and definition of the fields
For point 1, the example in the description would be idiomatically written so:
Node = object
case kind: NodeKind
of foo, bar, baz:
children: seq[Node]
of bad, bif:
dad: Node
if kind == foo:
additionalFieldForFoo: int
if kind == bar:
onlyValidForBar: float
if kind in {foo, bar}:
validForFooAndBar: string
For point 2, if we e.g. have boolean kind (see an example here) it could be written more naturally as:
ItemFragment = object
isRst: bool
if isRst:
rst: PRstNode
else:
str: string
Of course, only simplest syntactic forms of ifs would be recognized:
if field == ...:
if field in {...}:
if field: # for boolean `field`
The example given by @bluenote10 is exactly what I tried first when I was experimenting with Nim. I think this is the least astonishing way and quite intuitive.
So I strongly vote for his suggestion:
type Node = object case kind: NodeKind of foo: children: seq[Node] of bar: children: seq[SealedNode]
I would really appreciate to see this in Nim, the sooner the better! :blush:
In terms of priority, is there plan to include this in Nim 2 this year (2023)?
No, the focus shifted since then to a related, but yet-to-be-written RFC.
@Araq What's the status on this? Any progress on the related RFC?
As I said, no and I'm not sure I like the idea anymore.