typespec
typespec copied to clipboard
FeatReq: better emitted C# for complex, heterogeneous unions
Suggested label: emitter:client:csharp
Problem
Emitted C# code for complex unions (exposed as BinaryData
with partial backing type support) is extremely challenging to use; extensive customization is needed and information is lost relative to the spec. This is a major impediment for e.g. OpenAI (https://github.com/openai/openai-dotnet), a customer with an API that has a great many of these complex types.
Feature Request: given that design decisions will always be necessary about how to properly expose these types, it'd be enormously helpful for OpenAI and other customers that leverage these patterns if there were a way (via an opt-in option is fine) to emit a "mechanically complete, but some assembly required" type for complex unions.
Example
Consider the following .tsp:
model ExampleRequestOptionsModel {
example_union_field: "auto"
| int32
| {
type: "union_object_thing",
union_object_thing: {
data: string[];
}
}
}
example_union_field
, as denoted, can be either the enumeration-bound string "auto", a number, or an (inline) model in its own right.
A library implementer may want to represent this as something like the following, among many other possible options:
public partial class ComplexExampleThing {
public static ComplexExampleThing Auto { get; }
public ComplexExampleThing(int number) {}
public static ComplexExampleThing.FromCustomStrings(IEnumerable<string> items) {}
}
...But it's very hard to get to something like the above (or anything like it) from what's generated.
Observed today:
- The emitted parent type (here, in
ExampleRequestOptionsModel.cs
), represents the complex field asBinaryData
(public BinaryData ExampleUnionField { get; }
) - Likewise, the emitted serialization for the parent type only has logic for
BinaryData
; no code is emitted that can handle any of the strongly typed possibilities - With
keepAll
enabled forunreferenced-types-handling
, we do get emitted types for inline JSON models -- here, there's anExampleRequestOptionsModelExampleUnionField2
created that in turn has anExampleRequestOptionsModelExampleUnionFieldUnionObjectThing
instance that can ultimately serialize/deserialize and otherwise handle the string arraydata
as intended - But we do not have any representation anywhere of the non-object possibilities (positions "0" and "1") for
"auto"
andint32
- And further, even the object types we do have are not "connected' anywhere -- there's no parent type (where
BinaryData
goes) that hooks everything up and it's actually quite hard to inject a custom type to accomplish it while preserving code generation behavior
The request
Goal: make it easier to accomplish this scenario and have all of the "source of truth" information (types, enum values, etc.) fully represented via code generation (vs. requiring parallel representation to the spec inside custom code; this is fragile and laborious).
Non-goal: make it possible to have this fully generated. Given the myriad, open-ended design options available, I don't think that's a reasonable expectation for any code generator. "Someday," having strategy options that provide a baseline might be nice, but that's far less critical than just getting the generator back into a "helps instead of hinders" position for these types.
An example of what I'd personally love to have:
- (as needed) gate this behind an emitter option:
emit:
- "@azure-tools/typespec-csharp"
options:
"@azure-tools/typespec-csharp":
emit-ambiguous-union-components: true
- When allowed, emit types that fully encapsulate all possibilities for the complex union; in the example, that'd mean adding:
- A parent type for all of the available paths (e.g.
ExampleRequestOptionsModelExampleUnionFieldUnionContainer
) - A representation of the "primitive" types inside the new parent, here an
int
(position "1" in the union) - An enum/EE for the "auto" option (e.g.
ExampleRequestOptionsModelExampleUnionField0
) - Serialization code in the parent/container that conditionally handles whichever of the available paths are configured
With the above, referring back to the hypothetical public API the example could use:
[CodeGenModel("ExampleRequestOptionsModelExampleUnionFieldUnionContainer")]
public partial class ComplexExampleThing {
// These fields would be automatically generated; constituent types defaulting to internal visibility
private ExampleRequestOptionsModelExampleUnionField0? _unionVariant0;
private int? _unionVariant1;
private ExampleRequestOptionsModelExampleUnionField2 _unionVariant2;
internal ComplexExampleThing(
ExampleRequestOptionsModelExampleUnionField0? unionVariant0,
int? unionVariant1,
ExampleRequestOptionsModelExampleUnionField2 unionVariant2,
IDictionary<string, BinaryData> serializedAdditionalRawData)
{ /* sets fields */ }
// The public surface would require hand-customization since many implementation choices exist
public static ComplexExampleThing Auto { get; }
= new ComplexExampleThing(ExampleRequestOptionsModelExampleUnionField0.Auto, null, null, null);
public ComplexExampleThing(int number) { _unionVariant1 = number; }
public static ComplexExampleThing.FromCustomStrings(IEnumerable<string> items)
{ _unionVariant2 = new(items); }
}
Serialization would then just select based on the variant selection that isn't null:
if (_unionVariant0.HasValue) { writer.WriteObjectValue(_unionVariant0, options); }
else if (_unionVariant1.HasValue) { writer.WriteNumberValue(_unionVariant1.Value); }
else if (_unionVariant2 is not null) { writer.WriteObjectValue(_unionVariant2); }
Deserialization would key off the corresponding element type:
if (property.Value.ValueKind == JsonValueKind.String)
{
unionVariant0 = new unionVariant0(property.Value.GetString());
continue;
}
if (property.Value.ValueKind == JsonValueKind.Number)
{
unionVariant1 = property.Value.GetInt32();
continue;
}
if (property.Value.ValueKind == JsonValueKind.Object)
{
unionVariant2
= ExampleRequestOptionsModelExampleUnionField2
.DeserializeExampleRequestOptionsModelExampleUnionField2(property.Value, options);
continue;
}
// ...
return new ComplexExampleThing(unionVariant0, unionVariant1, unionVariant2, serializedAdditionalRawData);
Impact
If we had something like the above, we'd be able to reduce the amount of custom code in the OpenAI library by a considerable amount -- potentially up to 30-40% overall. Further, we'd greatly reduce the amount of spec adulteration we're currently relying on (e.g. introducing "dummy" types that we force visibility of via @@access
and @@usage
TCGC decorators) and be able to more directly use a "pure and correct" spec representation.
Having all of the underlying pieces as part of code generation, we'd be much better-positioned to catch updates -- in the example above, if another JSON object variant were introduced, we'd currently very likely miss it without deep scrutiny; with the above proposal, it'd be automatically included in the generated type.
Checklist
- [X] Follow our Code of Conduct
- [X] Read the docs.
- [X] Check that there isn't already an issue that request the same feature to avoid creating a duplicate.