vocab-idl Enumerations

Enumerations

Open gregsdennis opened this issue 1 year ago • 14 comments

https://github.com/Crell/enum-comparison

C/C++ - named integers
C# - named integers, but fields on static classes can support more complex objects (more of a pattern than language support), e.g. System.Drawing.SystemColors, also flag support allows bitwise operations
Java - explicit values, can be complex objects with private data
Python - named strings or integers
Typescript - named strings or integers
Haskell - string constants, kinda?
F# - named integers (but also supports unions)
Swift - string constants, kinda? also can contain simple or complex values, but not required
Rust - basically Swift
Kotlin - named strings or integers
Scala - explicit values (simple or complex)

The link above has a good summary, grouping these into three categories.

I think for JSON Schema, the primary takeaway is that they are all lists of values. Some languages allow more nuanced and powerful behaviors, but JSON Schema is more concerned with the data aspect than anything. As such, I think the collection of names is the important part here, which all support.

The enum keyword could work, but it may not be sufficient if underlying values are desired. For example, in C#, an enum can support bitwise operations, but to enable that, it needs to generate a [Flags] attribute and set all of the underlying integer values to powers of 2. Then it can also create named bitwise combinations. If just using a list of names, there's no way to describe this intent for proper code generation.

The "descriptive enum" approach using the anyOf keyword could work for this because we're just defining names and annotations for those names. However, the subschemas would be required to be uniform, and we'd probably still need another keyword to tell the codegen engine that we're defining an enum.

I recommend a new keyword (e.g. enumeration) to support this. It's still an array, but the items must either all be

just a string, in which case a simple set of names is generated (which seems to be universally supported)
an object with name and data properties which give more explicit information for more complex support

{
  "enumeration": [
    "HEARTS",
    "DIAMONDS",
    "CLUBS",
    "SPADES"
  ]
}

{
  "enumeration": [
    { "name": "HEARTS", "data": 1 },
    { "name": "DIAMONDS", "data": 2 },
    { "name": "CLUBS", "data": 3 },
    { "name": "SPADES", "data": 4 }
  ]
}

The second case becomes more complicated because of the different support among languages (even just the ones surveyed) for underlying data. Most support integer values, but not all, while some only support integer values. Some support more complex underlying data, while others don't support any underlying data.

I think the only resolution to this is that the schema can provide support for more complex needs, and those languages that don't support it can do what they deem appropriate, most likely just creating a list of names.

I also recommend the best practice of generating an "unknown" or "unset" enum value as the default.

In a validation context, the new enumeration keyword validates that the instance is either the string value of the item or the string value of the name of the item, whichever is defined.

Serialization

Another aspect to consider for enumerations is ensuring how things are serialized.

In C# circles there is often a debate as to whether an enum should be serialized by name or by the underlying integer. Historically, by integer is the default, which inevitably leads to someone adding a value in the middle of an enum, thereby changing the numbering for all the values that come after and screwing up deserialization of previously-serialized data.

The proposed solution to this is serializing by name, but that comes with its own risks, like name changes. Once a name is serialized somewhere, you pretty much need to support deserializing that name. As a result, spelling and other errors are forever persisted.

Do we want to provide guidance on this topic since we're effectively using schemas to define the serialized format?

Jun 07 '23 22:06 gregsdennis

vocab-idl vocab-idl copied to clipboard

Enumerations

Serialization

vocab-idl
vocab-idl copied to clipboard