aide icon indicating copy to clipboard operation
aide copied to clipboard

A note on polymorphism and discriminators.

Open tomas789 opened this issue 4 months ago • 2 comments

This is not really an issue. I'm writing it down for ones who come after us. I've spent unreasonable amount of time on this but I believe it is worth it. Rust is great for backends, Axum is likely the best option for the http stack and aide is the only option to generate OpenAPI automatically I managed to get usable results with. This was the only long standing issue for me as I rely heavily on the code generation from the OpenAPI spec. I'd suggest that the final solution should be at least mentioned in the official documentation as others will likely experience the same issue.

I have a simulated polymorphism where the classes TableItem, ChartItem, ... can be stored inside a container called AnyItem. My implementation is similar to this

#[derive(Deserialize, Serialize, JsonSchema)]
pub enum AnyItem {
    Table(TableItem),
    Chart(ChartItem)
}

#[derive(Deserialize, Serialize, JsonSchema)]
pub struct Item {
    pub title: Option<String>,
}

#[derive(Deserialize, Serialize, JsonSchema)]
pub struct TableItem {
    #[serde(flatten)]
    pub item: Item,
    pub colnames: todo!(),
    pub rows: todo!(),
}

#[derive(Deserialize, Serialize, JsonSchema)]
pub struct ChartItem {
    #[serde(flatten)]
    pub item: Item,
    pub image_path: todo!(),
}

The OpenAPI JSON file is consumed by datamodel-code-generator to generate Python's Pydantic classes. And it is also consumed by openapi-generator to generate Typescript models using the typescript-fetch generator. Both of those had a problem with AnyItem.

I'm doing serialization in Python and deserialization in Typescript so this is the only case I'll cover here.

Existing behavior

Case 1: Basic enum

Rust definition

#[derive(Deserialize, Serialize, Debug, JsonSchema)]
#[serde(rename_all = "lowercase")]
pub enum AnyItem {
    Table(TableItem),
    Chart(ChartItem),
}

OpenAPI

      "AnyItem": {
        "oneOf": [
          {
            "type": "object",
            "required": [
              "table"
            ],
            "properties": {
              "table": {
                "$ref": "#/components/schemas/TableItem"
              }
            },
            "additionalProperties": false
          },
          {
            "type": "object",
            "required": [
              "chart"
            ],
            "properties": {
              "chart": {
                "$ref": "#/components/schemas/ChartItem"
              }
            },
            "additionalProperties": false
          },

Problem is that both generators generate structs AnyItem1, AnyItem2 for each case of the oneOf. This is major PITA when trying to create the data structures as one has to guess which number corresponds to the type. Deserialization is fine.

Case 2: Tagged enum

Rust definition

#[derive(Deserialize, Serialize, Debug, JsonSchema)]
#[serde(tag="item_type", rename_all = "lowercase")]
pub enum AnyItem {
    Table(TableItem),
    Chart(ChartItem),
}

OpenAPI

"AnyItem": {
        "oneOf": [
          {
            "type": "object",
            "required": [
              "colnames",
              "item_type",
              "rows"
            ],
            "properties": {
              "card": {
                "type": "boolean",
                "nullable": true
              },
              "colnames": {
                "type": "array",
                "items": {
                  "type": "string"
                }
              },
              "item_type": {
                "type": "string",
                "enum": [
                  "table"
                ]
              },
              "rows": {
                "type": "array",
                "items": {
                  "type": "array",
                  "items": {
                    "$ref": "#/components/schemas/TableCell"
                  }
                }
              },
              "title": {
                "type": "string",
                "nullable": true
              }
            }
          },

This one has a problem that the actual structures ChartItem, ... don't have the item_type field so it cannot use references. It correctly inlined the temporary objects here. This situation is also bad for both serialization and deserialization as those inlined objects are completely different from the real ones.

Case 3: Tag and content

Rust code

#[derive(Deserialize, Serialize, Debug, JsonSchema)]
#[serde(tag="item_type", content="c", rename_all = "lowercase")]
pub enum AnyItem {
    Table(TableItem),
    Chart(ChartItem),
}

OpenAPI

      "AnyItem": {
        "oneOf": [
          {
            "type": "object",
            "required": [
              "c",
              "item_type"
            ],
            "properties": {
              "c": {
                "$ref": "#/components/schemas/TableItem"
              },
              "item_type": {
                "type": "string",
                "enum": [
                  "table"
                ]
              }
            }
          },
          {
            "type": "object",
            "required": [
              "c",
              "item_type"
            ],
            "properties": {
              "c": {
                "$ref": "#/components/schemas/ChartItem"
              },
              "item_type": {
                "type": "string",
                "enum": [
                  "chart"
                ]
              }
            }
          },

This fixes the problem on inlined objects from case 2 but it also creates temporary objects as was the case in 1.

Case 4: Untagged enum

Rust code

#[derive(Deserialize, Serialize, Debug, JsonSchema)]
#[serde(untagged, rename_all = "lowercase")]
pub enum AnyItem {
    Table(TableItem),
    Chart(ChartItem),
}

OpenAPI

      "AnyItem": {
        "anyOf": [
          {
            "$ref": "#/components/schemas/TableItem"
          },
          {
            "$ref": "#/components/schemas/ChartItem"
          },
          {
            "$ref": "#/components/schemas/LabelItem"
          },
          {
            "$ref": "#/components/schemas/TabsItem"
          },
          {
            "$ref": "#/components/schemas/StackItem"
          }
        ]
      },

This one is so close. The only problem is that there is no field marking which objects we are actually holding and so deserialization can get a little bit tricky. My problem was that the Typescript codegen created a superset object for AnyItem containing all fields from the items and that made it impossible to distinguish each object just by looking at fields.

Final solution

The solution I arrived at was to create a oneOf with discriminator and mapping as described in this blog post. I tried that manually and both generators produced a usable code without any temporary objects. We also need to use a tagged enums. How it works is that Serde adds a field with the type information. I hide this fact from the schema of AnyItem and instead use a discriminator. Here is my code.

#[derive(Deserialize, Serialize, Debug)]
#[serde(tag = "item_type", rename_all = "lowercase")]
pub enum AnyItem {
    Table(TableItem),
    Chart(ChartItem),
}

impl JsonSchema for AnyItem {
    fn schema_name() -> String {
        "AnyItem".to_string()
    }

    fn json_schema(gen: &mut schemars::gen::SchemaGenerator) -> Schema {
        let discriminator = json!({
            "propertyName": "item_type",
            "mapping": {
                "table": "#/components/schemas/TableItem",
                "chart": "#/components/schemas/ChartItem",
            }
        });

        let subschemas = SubschemaValidation {
            one_of: Some(vec![
                gen.subschema_for::<TableItem>(),
                gen.subschema_for::<ChartItem>(),
            ]),
            ..Default::default()
        };
        
        let schema_object = SchemaObject {
            subschemas: Some(Box::new(subschemas)),
            extensions: BTreeMap::from_iter(vec![("discriminator".to_owned(), discriminator)]),
            ..Default::default()
        };
        Schema::Object(schema_object)
    }
}

This generates following OpenAPI specification

      "AnyItem": {
        "oneOf": [
          {
            "$ref": "#/components/schemas/TableItem"
          },
          {
            "$ref": "#/components/schemas/ChartItem"
          }
        ],
        "discriminator": {
          "mapping": {
            "chart": "#/components/schemas/ChartItem",
            "label": "#/components/schemas/LabelItem"
          },
          "propertyName": "item_type"
        }
      },

In typescript, type of the AnyItem is

export type AnyItem = { itemType: 'chart' } & ChartItem | { itemType: 'table' } & TableItem;

Good thing is that the itemType field is added on top and ChartItem and TableItem are my exact objects. Generated code also uses itemType in a switch to decide how to properly deserialize the objects.

In Python the type of AnyItem is

class AnyItem(RootModel[Union[TableItem, ChartItem]]):
    root: Annotated[Union[TableItem, ChartItem], Field(discriminator='item_type')]

Which is also good. The only slight inconvenience is that it also added the item_type field to TableItem and ChartItem so I have to provide it when creating the object.

tomas789 avatar Feb 15 '24 07:02 tomas789