vocab-idl
vocab-idl copied to clipboard
Enumerations
https://github.com/Crell/enum-comparison
- C/C++ - named integers
- C# - named integers, but fields on static classes can support more complex objects (more of a pattern than language support), e.g.
System.Drawing.SystemColors
, also flag support allows bitwise operations - Java - explicit values, can be complex objects with private data
- Python - named strings or integers
- Typescript - named strings or integers
- Haskell - string constants, kinda?
- F# - named integers (but also supports unions)
- Swift - string constants, kinda? also can contain simple or complex values, but not required
- Rust - basically Swift
- Kotlin - named strings or integers
- Scala - explicit values (simple or complex)
The link above has a good summary, grouping these into three categories.
I think for JSON Schema, the primary takeaway is that they are all lists of values. Some languages allow more nuanced and powerful behaviors, but JSON Schema is more concerned with the data aspect than anything. As such, I think the collection of names is the important part here, which all support.
The enum
keyword could work, but it may not be sufficient if underlying values are desired. For example, in C#, an enum can support bitwise operations, but to enable that, it needs to generate a [Flags]
attribute and set all of the underlying integer values to powers of 2. Then it can also create named bitwise combinations. If just using a list of names, there's no way to describe this intent for proper code generation.
The "descriptive enum" approach using the anyOf
keyword could work for this because we're just defining names and annotations for those names. However, the subschemas would be required to be uniform, and we'd probably still need another keyword to tell the codegen engine that we're defining an enum.
I recommend a new keyword (e.g. enumeration
) to support this. It's still an array, but the items must either all be
- just a string, in which case a simple set of names is generated (which seems to be universally supported)
- an object with
name
anddata
properties which give more explicit information for more complex support
{
"enumeration": [
"HEARTS",
"DIAMONDS",
"CLUBS",
"SPADES"
]
}
{
"enumeration": [
{ "name": "HEARTS", "data": 1 },
{ "name": "DIAMONDS", "data": 2 },
{ "name": "CLUBS", "data": 3 },
{ "name": "SPADES", "data": 4 }
]
}
The second case becomes more complicated because of the different support among languages (even just the ones surveyed) for underlying data. Most support integer values, but not all, while some only support integer values. Some support more complex underlying data, while others don't support any underlying data.
I think the only resolution to this is that the schema can provide support for more complex needs, and those languages that don't support it can do what they deem appropriate, most likely just creating a list of names.
I also recommend the best practice of generating an "unknown" or "unset" enum value as the default.
In a validation context, the new enumeration
keyword validates that the instance is either the string value of the item or the string value of the name
of the item, whichever is defined.
Serialization
Another aspect to consider for enumerations is ensuring how things are serialized.
In C# circles there is often a debate as to whether an enum should be serialized by name or by the underlying integer. Historically, by integer is the default, which inevitably leads to someone adding a value in the middle of an enum, thereby changing the numbering for all the values that come after and screwing up deserialization of previously-serialized data.
The proposed solution to this is serializing by name, but that comes with its own risks, like name changes. Once a name is serialized somewhere, you pretty much need to support deserializing that name. As a result, spelling and other errors are forever persisted.
Do we want to provide guidance on this topic since we're effectively using schemas to define the serialized format?