terraform-provider-airbyte icon indicating copy to clipboard operation
terraform-provider-airbyte copied to clipboard

Re-applying terraform config results in invalid connection configuration

Open exactlyaaron opened this issue 1 year ago • 10 comments

I will start by saying that I am very new to terraform as well as airbyte. I am working on building a proof-of-concept connection of mysql to snowflake.

Airbyte 0.50.29 running on Docker Desktop in MacOS MySQL running on mac locally Snowflake running in GCP

airbyte source connector MySQL v3.0.4 airbyte destination connector Snowflake v3.1.2

When I do the initial terraform init/plan/apply everything works correctly and the connection is setup properly. The example I want to demo is adding a new stream to the terraform configuration that updates the connection streams accordingly. I have one incremental stream and want to add a full refresh stream. However, when I update the .tf file, I get a 400 error from the API.

Here are the steps I took:

  • create .tf file with provider, source, destination, and connection
  • terraform init
  • terraform plan --out plan.cache
  • terraform apply plan.cache
  • then update the .tf file connection resource to include another stream
  • terraform plan --out plan.cache
  • terraform apply plan.cache

After applying I get the 400 error:

│ Error: unexpected response from API. Got an unexpected response code 400
│ 
│   with airbyte_connection.mysql_snowflake_connection,
│   on main.tf line 76, in resource "airbyte_connection" "mysql_snowflake_connection":
│   76: resource "airbyte_connection" "mysql_snowflake_connection" {
│ 
│ **Request**:
│ PATCH /v1/connections/5210b869-537e-468a-9106-fff782860090 HTTP/1.1
│ Host: localhost:8006
│ Accept: application/json
│ Authorization: Basic YWlyYnl0ZTpwYXNzd29yZA==
│ Content-Type: application/json
│ User-Agent: speakeasy-sdk/terraform 0.3.3 2.86.10 1.0.0
│ 
│ {"configurations":{"streams":[{"cursorField":["_ab_cdc_cursor"],"name":"members","primaryKey":[["id"]],"syncMode":"incremental_deduped_history"},{"name":"movies"}]},"dataResidency":"auto","name":"local mysql \u003c\u003e snowflake
│ dev","namespaceDefinition":"source","nonBreakingSchemaUpdatesBehavior":"propagate_fully","schedule":{"scheduleType":"manual"},"status":"active"}
│ 
│ **Response**:
│ HTTP/1.1 400 Bad Request
│ Content-Length: 296
│ Connection: keep-alive
│ Content-Type: application/problem+json
│ Date: Tue, 12 Sep 2023 19:01:22 GMT
│ Server: nginx/1.25.2
│ 
│ {"type":"https://reference.airbyte.com/reference/errors","title":"bad-request","status":400,"detail":"The body of the request contains an invalid connection configuration. Primary key for stream: members is already pre-defined. Please do NOT
│ include a primary key configuration for this stream."}

Original main.tf config:

# provider.tf
terraform {
  required_providers {
    airbyte = {
      source = "airbytehq/airbyte"
      version = "0.3.3"
    }
  }
}

# provider.tf
provider "airbyte" {
  username = "airbyte"
  password = "password"
  server_url = "http://localhost:8006/v1"
}

resource "airbyte_source_mysql" "aaron_test_database" {
  configuration = {
    database           = "aaron_test"
    host               = "host.docker.internal"
    username           = "root"
    password           = ""
    port               = 3306
    replication_method = {
      source_mysql_update_method_read_changes_using_binary_log_cdc = {
        method                  = "CDC"
        server_time_zone        = "UTC"
      }
    }
    source_type        = "mysql"
    ssl_mode = {
      source_mysql_ssl_modes_preferred = {
        mode = "preferred"
      }
    }
    tunnel_method = {
      source_mysql_ssh_tunnel_method_no_tunnel = {
        tunnel_method = "NO_TUNNEL"
      }
    }
  }
  name         = "Aaron Test database"
  workspace_id = "fd460825-10a6-414b-a91e-3a3a2f444179"
}

resource "airbyte_destination_snowflake" "aaron_destination_snowflake" {
  configuration = {
    loading_method = {
        destination_snowflake_data_staging_method_recommended_internal_staging = {
            method = "Internal Staging"
        }
    }
    credentials = {
      destination_snowflake_authorization_method_username_and_password = {
        auth_type = "Username and Password"
        password            = "..."
      }
    }
    database         = "AIRBYTE_DATABASE"
    destination_type = "snowflake"
    host             = "..." 
    # jdbc_url_params  = "...my_jdbc_url_params..."
    # raw_data_schema  = "...my_raw_data_schema..."
    role             = "AIRBYTE_ROLE"
    schema           = "AIRBYTE_SCHEMA"
    use_1s1t_format  = true
    username         = "AIRBYTE_USER"
    warehouse        = "AIRBYTE_WAREHOUSE"
    
  }
  name         = "Aaron Snowflake Test"
  workspace_id = "fd460825-10a6-414b-a91e-3a3a2f444179"
}

resource "airbyte_connection" "mysql_snowflake_connection" {
  name                 = "local mysql <> snowflake dev"
  source_id            = airbyte_source_mysql.aaron_test_database.source_id
  destination_id       = airbyte_destination_snowflake.aaron_destination_snowflake.destination_id
  namespace_definition = "source"
  status               = "active"
  non_breaking_schema_updates_behavior = "propagate_fully"
  configurations       = {
    streams = [
      {
        name = "members"
        sync_mode = "incremental_deduped_history"
      }
    ]
  }
}

And the updated main.tf to include another stream. (All is the same besides the streams array in the connection resource)

# provider.tf
terraform {
  required_providers {
    airbyte = {
      source = "airbytehq/airbyte"
      version = "0.3.3"
    }
  }
}

# provider.tf
provider "airbyte" {
  username = "airbyte"
  password = "password"
  server_url = "http://localhost:8006/v1"
}

resource "airbyte_source_mysql" "aaron_test_database" {
  configuration = {
    database           = "aaron_test"
    host               = "host.docker.internal"
    username           = "root"
    password           = ""
    port               = 3306
    replication_method = {
      source_mysql_update_method_read_changes_using_binary_log_cdc = {
        method                  = "CDC"
        server_time_zone        = "UTC"
      }
    }
    source_type        = "mysql"
    ssl_mode = {
      source_mysql_ssl_modes_preferred = {
        mode = "preferred"
      }
    }
    tunnel_method = {
      source_mysql_ssh_tunnel_method_no_tunnel = {
        tunnel_method = "NO_TUNNEL"
      }
    }
  }
  name         = "Aaron Test database"
  workspace_id = "fd460825-10a6-414b-a91e-3a3a2f444179"
}

resource "airbyte_destination_snowflake" "aaron_destination_snowflake" {
  configuration = {
    loading_method = {
        destination_snowflake_data_staging_method_recommended_internal_staging = {
            method = "Internal Staging"
        }
    }
    credentials = {
      destination_snowflake_authorization_method_username_and_password = {
        auth_type = "Username and Password"
        password            = "..."
      }
    }
    database         = "AIRBYTE_DATABASE"
    destination_type = "snowflake"
    host             = "..." 
    # jdbc_url_params  = "...my_jdbc_url_params..."
    # raw_data_schema  = "...my_raw_data_schema..."
    role             = "AIRBYTE_ROLE"
    schema           = "AIRBYTE_SCHEMA"
    use_1s1t_format  = true
    username         = "AIRBYTE_USER"
    warehouse        = "AIRBYTE_WAREHOUSE"
    
  }
  name         = "Aaron Snowflake Test"
  workspace_id = "fd460825-10a6-414b-a91e-3a3a2f444179"
}

resource "airbyte_connection" "mysql_snowflake_connection" {
  name                 = "local mysql <> snowflake dev"
  source_id            = airbyte_source_mysql.aaron_test_database.source_id
  destination_id       = airbyte_destination_snowflake.aaron_destination_snowflake.destination_id
  namespace_definition = "source"
  status               = "active"
  non_breaking_schema_updates_behavior = "propagate_fully"
  configurations       = {
    streams = [
      {
        name = "members"
        sync_mode = "incremental_deduped_history"
      },
      {
        name = "movies"
      }
    ]
  }
}

If I remove all the source/destination/connection entries from airbyte manually and completely redo the terraform init/plan/apply steps with both streams from the start it works fine. The problem arises whenever I try to update the main.tf and plan/apply again.

It is very possible this is user error but any help is greatly appreciated! Whenever it works the connections are awesome! Just trying to get a proof-of-concept built that shows everything can be managed without the UI. Thank you.

exactlyaaron avatar Sep 12 '23 19:09 exactlyaaron

The error seems to stem from the terraform.tfstate file that is created upon the "apply" command. It results in this configuration:

"configurations": {
              "streams": [
                {
                  "cursor_field": [
                    "_ab_cdc_cursor"
                  ],
                  "name": "members",
                  "primary_key": [
                    [
                      "id"
                    ]
                  ],
                  "sync_mode": "incremental_deduped_history"
                }
              ]
            },

The presence of that primary key configuration appears to be a problem when a secondary apply after changes is run.

exactlyaaron avatar Sep 12 '23 22:09 exactlyaaron

Another follow-up detail. I attempted to utilize 2 full refresh streams to test.

  • terraform init
  • terraform plan
  • terraform apply with a single full refresh stream
  • update main.tf connection resource to include a second full refresh stream
  • terraform plan
  • terraform apply

Upon applying I get another error from the provider:

airbyte_connection.mysql_snowflake_connection: Modifying... [name=local mysql <> snowflake dev]
╷
│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to airbyte_connection.mysql_snowflake_connection, provider "provider[\"registry.terraform.io/airbytehq/airbyte\"]" produced an unexpected new value: .configurations.streams[1].sync_mode: was null, but now
│ cty.StringVal("full_refresh_overwrite").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

exactlyaaron avatar Sep 13 '23 16:09 exactlyaaron

I am experiencing the same behavior with a MSSQL <> Redshift resource for what its worth.

thatkellenguy avatar Sep 14 '23 20:09 thatkellenguy

Ran into the same problem with the Google Analytics 4 source, and maybe it happens with any source that has pre-defined primary keys.

It looks like there might be a GET request to retrieve the current configs for the connection (including each stream's primaryKey) and then a PATCH request with this retrieved current state and the changes, running into the error.

Maybe there should be a verification on if each stream has a source defined primary key, so it won't be included on the PATCH request. This could be verified by checking the sourceDefinedPrimaryKey field on the response of the GET streams route.

thiagoazcampos avatar Oct 02 '23 16:10 thiagoazcampos

Have the same issue with GoogleSearchConsole, and indeed those streams have pre-defined primary keys.

yamhirotoCS avatar Nov 22 '23 08:11 yamhirotoCS

Same for JIRA

yamhirotoCS avatar Dec 05 '23 14:12 yamhirotoCS

Same for Postgres <> Snowflake

Santhin avatar Dec 12 '23 08:12 Santhin

Same for us when using Hubspot/Chargebee/Stripe connections with incremental append sync mode to BigQuery.

sowla avatar May 24 '24 06:05 sowla

Same for Stripe -> Snowflake (and MySQL -> Snowflake but that's already mentioned above)

AlaaInflyter avatar Jun 07 '24 07:06 AlaaInflyter

@szemek helped fix at https://github.com/airbytehq/terraform-provider-airbyte/pull/111 Hope Airbyte team can help merge it soon! ☺️

hongbo-miao avatar Jul 03 '24 01:07 hongbo-miao

Tested on the Airbyte provider version 0.6.5. Confirmed this issue has been fixed, thanks! ☺️

hongbo-miao avatar Oct 22 '24 03:10 hongbo-miao