typespec icon indicating copy to clipboard operation
typespec copied to clipboard

FeatReq: better emitted C# for complex, heterogeneous unions

Open trrwilson opened this issue 6 months ago • 0 comments

Suggested label: emitter:client:csharp

Problem

Emitted C# code for complex unions (exposed as BinaryData with partial backing type support) is extremely challenging to use; extensive customization is needed and information is lost relative to the spec. This is a major impediment for e.g. OpenAI (https://github.com/openai/openai-dotnet), a customer with an API that has a great many of these complex types.

Feature Request: given that design decisions will always be necessary about how to properly expose these types, it'd be enormously helpful for OpenAI and other customers that leverage these patterns if there were a way (via an opt-in option is fine) to emit a "mechanically complete, but some assembly required" type for complex unions.

Example

Consider the following .tsp:

model ExampleRequestOptionsModel {
  example_union_field: "auto"
    | int32
    | {
      type: "union_object_thing",
      union_object_thing: {
        data: string[];
      }
    }
}

example_union_field, as denoted, can be either the enumeration-bound string "auto", a number, or an (inline) model in its own right.

A library implementer may want to represent this as something like the following, among many other possible options:

public partial class ComplexExampleThing {
    public static ComplexExampleThing Auto { get; }
    public ComplexExampleThing(int number) {}
    public static ComplexExampleThing.FromCustomStrings(IEnumerable<string> items) {}
}

...But it's very hard to get to something like the above (or anything like it) from what's generated.

Observed today:

  • The emitted parent type (here, in ExampleRequestOptionsModel.cs), represents the complex field as BinaryData (public BinaryData ExampleUnionField { get; })
  • Likewise, the emitted serialization for the parent type only has logic for BinaryData; no code is emitted that can handle any of the strongly typed possibilities
  • With keepAll enabled for unreferenced-types-handling, we do get emitted types for inline JSON models -- here, there's an ExampleRequestOptionsModelExampleUnionField2 created that in turn has an ExampleRequestOptionsModelExampleUnionFieldUnionObjectThing instance that can ultimately serialize/deserialize and otherwise handle the string array data as intended
  • But we do not have any representation anywhere of the non-object possibilities (positions "0" and "1") for "auto" and int32
  • And further, even the object types we do have are not "connected' anywhere -- there's no parent type (where BinaryData goes) that hooks everything up and it's actually quite hard to inject a custom type to accomplish it while preserving code generation behavior

The request

Goal: make it easier to accomplish this scenario and have all of the "source of truth" information (types, enum values, etc.) fully represented via code generation (vs. requiring parallel representation to the spec inside custom code; this is fragile and laborious).

Non-goal: make it possible to have this fully generated. Given the myriad, open-ended design options available, I don't think that's a reasonable expectation for any code generator. "Someday," having strategy options that provide a baseline might be nice, but that's far less critical than just getting the generator back into a "helps instead of hinders" position for these types.

An example of what I'd personally love to have:

  1. (as needed) gate this behind an emitter option:
emit:
  - "@azure-tools/typespec-csharp"
options:
  "@azure-tools/typespec-csharp":
    emit-ambiguous-union-components: true
  1. When allowed, emit types that fully encapsulate all possibilities for the complex union; in the example, that'd mean adding:
  • A parent type for all of the available paths (e.g. ExampleRequestOptionsModelExampleUnionFieldUnionContainer)
  • A representation of the "primitive" types inside the new parent, here an int (position "1" in the union)
  • An enum/EE for the "auto" option (e.g. ExampleRequestOptionsModelExampleUnionField0)
  • Serialization code in the parent/container that conditionally handles whichever of the available paths are configured

With the above, referring back to the hypothetical public API the example could use:

[CodeGenModel("ExampleRequestOptionsModelExampleUnionFieldUnionContainer")]
public partial class ComplexExampleThing {
    // These fields would be automatically generated; constituent types defaulting to internal visibility
    private ExampleRequestOptionsModelExampleUnionField0? _unionVariant0;
    private int? _unionVariant1;
    private ExampleRequestOptionsModelExampleUnionField2 _unionVariant2;
    internal ComplexExampleThing(
        ExampleRequestOptionsModelExampleUnionField0? unionVariant0,
        int? unionVariant1,
        ExampleRequestOptionsModelExampleUnionField2 unionVariant2,
        IDictionary<string, BinaryData> serializedAdditionalRawData)
    { /* sets fields */ }

    // The public surface would require hand-customization since many implementation choices exist
    public static ComplexExampleThing Auto { get; }
        = new ComplexExampleThing(ExampleRequestOptionsModelExampleUnionField0.Auto, null, null, null);
    public ComplexExampleThing(int number) { _unionVariant1 = number; }
    public static ComplexExampleThing.FromCustomStrings(IEnumerable<string> items)
    { _unionVariant2 = new(items); }
}

Serialization would then just select based on the variant selection that isn't null:

if (_unionVariant0.HasValue) { writer.WriteObjectValue(_unionVariant0, options); }
else if (_unionVariant1.HasValue) { writer.WriteNumberValue(_unionVariant1.Value); }
else if (_unionVariant2 is not null) { writer.WriteObjectValue(_unionVariant2); }

Deserialization would key off the corresponding element type:

if (property.Value.ValueKind == JsonValueKind.String)
{
    unionVariant0 = new unionVariant0(property.Value.GetString());
    continue;
}
if (property.Value.ValueKind == JsonValueKind.Number)
{
    unionVariant1 = property.Value.GetInt32();
    continue;
}
if (property.Value.ValueKind == JsonValueKind.Object)
{
    unionVariant2
    = ExampleRequestOptionsModelExampleUnionField2
        .DeserializeExampleRequestOptionsModelExampleUnionField2(property.Value, options);
    continue;
}
// ...
return new ComplexExampleThing(unionVariant0, unionVariant1, unionVariant2, serializedAdditionalRawData);

Impact

If we had something like the above, we'd be able to reduce the amount of custom code in the OpenAI library by a considerable amount -- potentially up to 30-40% overall. Further, we'd greatly reduce the amount of spec adulteration we're currently relying on (e.g. introducing "dummy" types that we force visibility of via @@access and @@usage TCGC decorators) and be able to more directly use a "pure and correct" spec representation.

Having all of the underlying pieces as part of code generation, we'd be much better-positioned to catch updates -- in the example above, if another JSON object variant were introduced, we'd currently very likely miss it without deep scrutiny; with the above proposal, it'd be automatically included in the generated type.

Checklist

  • [X] Follow our Code of Conduct
  • [X] Read the docs.
  • [X] Check that there isn't already an issue that request the same feature to avoid creating a duplicate.

trrwilson avatar Aug 16 '24 17:08 trrwilson