Add support for union shapes in the code generator
Newer AWS services like bedrock-runtime are using union shapes in their API. Contrary to older services (like S3 or MediaConvert) that have a type property containing an enum, and then additional properties used depending on the type, union shapes expect exactly 1 of their members to be filled.
In the SDK manifest, those union shapes are defined with union: true: https://github.com/aws/aws-sdk-php/blob/33f6d393834053b11a5fc10573a211d92e61c2db/src/data/bedrock-runtime/2023-09-30/api-2.json#L1013-L1021
Looking at official AWS SDKs for other languages (the official PHP SDK does not return typed responses and so cannot be used as comparison), here is how they represent it:
- JS defines a subtype for each member (naming it like
<shapeName>.<memberName>Member) with that member defined with its type and all other members defined as optional properties with typenever(which forbids having such properties in the object in typescript) and then uses a union type: https://github.com/aws/aws-sdk-js-v3/blob/b09a8f969ad6b2508e4d3de93089fe4498f57408/clients/client-bedrock-runtime/src/models/models_0.ts#L2351-L2382. They also expose aVisitorinterface with methods for each type, with avisitfunction calling the right method of the visitor: https://github.com/aws/aws-sdk-js-v3/blob/b09a8f969ad6b2508e4d3de93089fe4498f57408/clients/client-bedrock-runtime/src/models/models_0.ts#L2544-L2569 (note that this Visitor interface seems to imply that adding a new member in the union would be a BC break) - Go defines an interface for the union shape, with implementations for each member (named
<shapeName>Member<memberName>) that have aValuefield containing the value of the member: https://github.com/aws/aws-sdk-go-v2/blob/43445011355b4bb4226fa797c5fc141bbc1b793f/service/bedrockruntime/types/types.go#L127-L137 The serializer and deserializer take care of using the right structure for each kind of members. - the .NET SDK defines a class with a property for each member, which does not seem to enforce in the type system that only one member is set (maybe their generator does not have support for union shapes and generated it as a normal structure shape): https://github.com/aws/aws-sdk-net/blob/09201e894db71b5d536db6cb0975f361b85f1f52/sdk/src/Services/BedrockRuntime/Generated/Model/ContentBlock.cs
- the C++ SDK does something similar to the .NET SDK: https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-bedrock-runtime/source/model/ContentBlock.cpp
- the Ruby SDK does something weird. They have a ContentBlock class documenting that a single attribute should be set. And they also have subclasses that are used when deserializing (but that don't seem to limit the API in the type system): https://github.com/aws/aws-sdk-ruby/blob/934a6ad172c65754fa0975b48464c5499f368bd3/gems/aws-sdk-bedrockruntime/lib/aws-sdk-bedrockruntime/types.rb#L296-L298
- the Java SDK defines a class with getters for each members and a
type()getter returning an enum identifying which member is relevant (inferred from which member is populated). That class has static factory methods for each member. See https://github.com/aws/aws-sdk-java-v2/blob/master/docs/design/core/tagged-unions/README.md#union-type-definition for an example in their design documentation (the generated code is not committed for the Java SDK) - The Kotlin SDK defines a sealed class for the union shape, with child classes for each member with a
valuefield of the appropriate type. The base class hasas*andas*OrNullmethods for each member that return the associatedvalueif it is of the appropriate type and either throw or returnnullwhen called on another type. See https://sdk.amazonaws.com/kotlin/api/latest/bedrockruntime/aws.sdk.kotlin.services.bedrockruntime.model/-content-block/index.html for an example
@Nyholm @jderusse do you have any preference about the solution for that ?
It is needed to unblock #1861
Hm. Should we do something similar to go?
My investigation of the .NET SDK confirms that their code generator does not have any special code for union shapes, and so the code is generated like for a normal structure shape (which is why there is no enforcement). I haven't checked the C++ code generator, but I suspect it is the same case as the output is similar.
Let's take the RDSData Field for example.
"Field":{
"type":"structure",
"members":{
"isNull":{"shape":"BoxedBoolean"},
"booleanValue":{"shape":"BoxedBoolean"},
"longValue":{"shape":"BoxedLong"},
"doubleValue":{"shape":"BoxedDouble"},
"stringValue":{"shape":"String"},
"blobValue":{"shape":"Blob"},
"arrayValue":{"shape":"ArrayValue"}
},
"union":true
},
The current implementation generated:
/**
* Contains a value.
*/
final class Field
{
/**
* A NULL value.
*
* @var bool|null
*/
private $isNull;
/**
* A value of Boolean data type.
*
* @var bool|null
*/
private $booleanValue;
/**
* A value of long data type.
*
* @var int|null
*/
private $longValue;
...
From a DX point of view users need to check content of each property to get the type.
I agree that having if $field instanceOf FieldIsNull would have been cleaner.
I've few question:
- shared interface/class or union type ?
shared:
abstract class `Field` {}
class FieldMemberIsNull extends Field {}
private Field $field;
union type
class FieldMemberIsNull {}
private FieldMemberIsNull|FieldMemberBooleanValue $field;
I would go for an abstract class that simplifies the signature and does not require changing signatures everytime a new value is added.
- unique property name (like Golang) or keep the orignal property
unique
class FieldMemberBooleanValue
{
private $value;
}
preserve
class FieldMemberBooleanValue
{
private $booleanValue;
}
The second one sounds dumb (the class name is repeated in the property). But it is ~compatible with existing code. ie echo $response->field->stringValue
- nullable or not?
Current properties are all nullable (because of "one of". all other property are not defined and therefor null), with union, this would not be the case anymore, so we don't have to let them null.
BUT having null value might help users to write simple code.
ie, in RDS data, I know (for whatever reason) that field is either string or null.
So my code could have been echo $response->field->stringValue ?? 'default';. Simple and readable.
Of COURSE, I could/should have used something more robust like echo $response->field->isNull ? 'default' : $response->field->stringValue; but previous code works too.
With our new implementation, userland code have to be echo $response->field instance of FieldMemberIsNull ? 'default' : $response->field->stringValue;.
So from a DX PoF, I wonder if we should not expose all properties with null value as fallback
abstract class Field {
bool|null $isNull;
bool|null $booleanValue;
int|null $longValue;
}
class FieldMemberIsNull extends Field {
bool $isNull;
}
class FieldMemberBooleanValue extends Field {
bool $booleanValue;
}