airbyte-api-python-sdk icon indicating copy to clipboard operation
airbyte-api-python-sdk copied to clipboard

Resolve issue where config from API is cast to incorrect model type (faulty discriminator logic)

Open aaronsteers opened this issue 1 year ago • 0 comments

With our library of 300+ sources and 60+ destinations, certain API endpoints should return a "configuration" that is typed to the correct source or destination class, but they don't properly deserialize into the proper classes. Instead, they attempt to deserialize into the first match alphabetically (e.g. "Airtable" instead of "Snowflake" or "MySQL").

I've worked around this by hacking a bit and getting the original raw dict object, but this has been a stumbling block for specific use cases.

Workaround logic is here:

https://github.com/airbytehq/PyAirbyte/blob/f7b88eba400d7aa768c8c370cfbac6f18dfc61c6/airbyte/_util/api_util.py#L577-L597


Speakeasy has some docs on how to set up discriminator logic here:

  • https://www.speakeasy.com/openapi/schemas/objects/polymorphism#discriminator-object-in-openapi

Example code from the docs:

components:
  responses:
    OrderResponse:
      oneOf:
        - $ref: "#/components/schemas/DrinkOrder"
        - $ref: "#/components/schemas/IngredientOrder"
      discriminator:
        propertyName: orderType

In our case, the descriminating property does exist in the data as sourceType and destinationType, but it is not defined with the above syntax.

Here is an example declaration which shows sourceType should be ready to leverage if we reference it in the descriminator declaration:

    source-aha:
      type: "object"
      required:
      - "api_key"
      - "url"
      - "sourceType"
      properties:
        api_key:
          type: "string"
          title: "API Bearer Token"
          airbyte_secret: true
          description: "API Key"
          order: 0
          x-speakeasy-param-sensitive: true
        url:
          type: "string"
          description: "URL"
          title: "Aha Url Instance"
          order: 1
        sourceType:
          title: "aha"
          const: "aha"
          enum:
          - "aha"
          order: 0
          type: "string"

Current it does not appear that we define any descriminator logic to DestinationConfiguration or SourceConfiguration.

Below is destination configuration. Note there is oneOf logic but no discriminator logic defined. Same for SourceConfiguration, although I'm not showing it because it is much larger.

Show/Hide

https://raw.githubusercontent.com/airbytehq/airbyte-platform/refs/heads/main/airbyte-api/server-api/src/main/openapi/api_sdk.yaml

    DestinationConfiguration:
      description: The values required to configure the destination.
      example: { user: "charles" }
      oneOf:
        - title: destination-google-sheets
          $ref: "#/components/schemas/destination-google-sheets"
        - title: destination-astra
          $ref: "#/components/schemas/destination-astra"
        - title: destination-aws-datalake
          $ref: "#/components/schemas/destination-aws-datalake"
        - title: destination-azure-blob-storage
          $ref: "#/components/schemas/destination-azure-blob-storage"
        - title: destination-bigquery
          $ref: "#/components/schemas/destination-bigquery"
        - title: destination-clickhouse
          $ref: "#/components/schemas/destination-clickhouse"
        - title: destination-convex
          $ref: "#/components/schemas/destination-convex"
        - title: destination-databricks
          $ref: "#/components/schemas/destination-databricks"
        - title: destination-dev-null
          $ref: "#/components/schemas/destination-dev-null"
        - title: destination-duckdb
          $ref: "#/components/schemas/destination-duckdb"
        - title: destination-dynamodb
          $ref: "#/components/schemas/destination-dynamodb"
        - title: destination-elasticsearch
          $ref: "#/components/schemas/destination-elasticsearch"
        - title: destination-firebolt
          $ref: "#/components/schemas/destination-firebolt"
        - title: destination-firestore
          $ref: "#/components/schemas/destination-firestore"
        - title: destination-gcs
          $ref: "#/components/schemas/destination-gcs"
        - title: destination-iceberg
          $ref: "#/components/schemas/destination-iceberg"
        - title: destination-milvus
          $ref: "#/components/schemas/destination-milvus"
        - title: destination-mongodb
          $ref: "#/components/schemas/destination-mongodb"
        - title: destination-motherduck
          $ref: "#/components/schemas/destination-motherduck"
        - title: destination-mssql
          $ref: "#/components/schemas/destination-mssql"
        - title: destination-mysql
          $ref: "#/components/schemas/destination-mysql"
        - title: destination-oracle
          $ref: "#/components/schemas/destination-oracle"
        - title: destination-pgvector
          $ref: "#/components/schemas/destination-pgvector"
        - title: destination-pinecone
          $ref: "#/components/schemas/destination-pinecone"
        - title: destination-postgres
          $ref: "#/components/schemas/destination-postgres"
        - title: destination-pubsub
          $ref: "#/components/schemas/destination-pubsub"
        - title: destination-qdrant
          $ref: "#/components/schemas/destination-qdrant"
        - title: destination-redis
          $ref: "#/components/schemas/destination-redis"
        - title: destination-redshift
          $ref: "#/components/schemas/destination-redshift"
        - title: destination-s3
          $ref: "#/components/schemas/destination-s3"
        - title: destination-s3-glue
          $ref: "#/components/schemas/destination-s3-glue"
        - title: destination-sftp-json
          $ref: "#/components/schemas/destination-sftp-json"
        - title: destination-snowflake
          $ref: "#/components/schemas/destination-snowflake"
        - title: destination-snowflake-cortex
          $ref: "#/components/schemas/destination-snowflake-cortex"
        - title: destination-teradata
          $ref: "#/components/schemas/destination-teradata"
        - title: destination-timeplus
          $ref: "#/components/schemas/destination-timeplus"
        - title: destination-typesense
          $ref: "#/components/schemas/destination-typesense"
        - title: destination-vectara
          $ref: "#/components/schemas/destination-vectara"
        - title: destination-weaviate
          $ref: "#/components/schemas/destination-weaviate"
        - title: destination-yellowbrick
          $ref: "#/components/schemas/destination-yellowbrick"

Proposed fix

To resolve, we should add this text to the DestinationConfiguration declaration in the OpenAPI spec:

    DestinationConfiguration:
      # ...
      discriminator:
        propertyName: destinationType
      oneOf:
        - title: destination-google-sheets
          $ref: "#/components/schemas/destination-google-sheets"
      # ...

and similarly for sources:

    SourceConfiguration:
      # ...
      discriminator:
        propertyName: sourceType
      oneOf:
        - title: ...
          $ref: ...
      # ...

aaronsteers avatar Jul 30 '24 23:07 aaronsteers