airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

[source-postgres] Array fields in PostgreSQL table are not populating in Snowflake target table

Open asimy opened this issue 2 years ago • 6 comments

Connector Name

source-postgres

Connector Version

3.1.7

What step the error happened?

During the sync

Relevant information

Note: the connection is from source-postgres 3.1.7 to destination-snowflake 3.1.2

We have a PostgreSQL table that tracks transfers of a list of customer from one agent to another.

create table customer_transfers
 (
     id                          uuid   default gen_random_uuid() not null
         primary key,
     transferred_customer_ids      uuid[] default '{}'::uuid[],
     from_agent_id            uuid                                    not null,
     to_agent_id              uuid                                    not null,
     created_at                  timestamp                               not null,
     updated_at                  timestamp                               not null,
 );

We are syncing it from PostgreSQL into Snowflake for analysis and noticed that the transferred_customer_ids in the Snowflake target table was NULL for all records. Additionally, we are seeing the following in the _airbyte_meta field for each record in the target table

{   "errors": [     "Problem with `transferred_customer_ids`"] }

When looking at the intermediate _airbyte_raw_customer_transfers table, I noticed that the json object in the _airbyte_data field for the records looked like

{
  "created_at": "2018-12-26T22:39:09.357020",
  "from_agent_id": "79e7c905-84cc-4674-a10f-a89e89710a21",
  "id": "32f9096a-717b-4be9-9b7f-d0d5c579c29f",
  "to_agent_id": "b6c85557-c920-4ad3-9be4-fabfc49f3dd0",
  "transferred_customer_ids": "{9d324c1c-cee7-4050-8e93-31add0625073,ae664d72-1cd2-4647-82a5-864b09cfd696,ae664d72-1cd2-4647-82a5-864b09cfd696}",
  "updated_at": "2018-12-26T22:39:09.357020"
}

Since the value for transferred_customer_ids is stored as a string (following Postgres conventions) rather than as a json array, I suspect that is causing the snowflake-target import process to report the values as the incorrect data type for the column in the target table and replacing them with NULLs when importing the other values.

Also, I'm happy to take a stab at submitting a fix, but it has been a very long time since I've worked in a Java code base so I'd likely be quite slow to get this resolved--assuming my diagnosis of the problem is even correct.

Relevant log output

No response

Contribute

  • [ ] Yes, I want to contribute

asimy avatar Feb 08 '24 00:02 asimy

@asimy in the settings of your connection, what is the data type for the column? Destinations trust this catalog for typing information, so if the data type is correct there then the destinations should cast it properly.

jbfbell avatar Apr 03 '24 16:04 jbfbell

@jbfbell, the data type is "Array" in the connection.

asimy avatar Apr 03 '24 19:04 asimy

@asimy, did you get a solution to this problem. I have the same issue.

Wamolambo avatar Apr 18 '24 07:04 Wamolambo

@Wamolambo I have not found a solution. None of the array fields in our PG tables are populating in Snowflake.

asimy avatar Apr 18 '24 08:04 asimy

This is source-postgres issue - transferring to the right team

evantahler avatar May 07 '24 20:05 evantahler

Possible duplicate of airbytehq/airbyte#34986

postamar avatar May 15 '24 16:05 postamar

Just verified with the latest PG -> Snowflake and BigQuery syncs. uuid arrays can now be correctly synced. I'm closing this ticket. Please reopen if this happens again.

theyueli avatar May 23 '24 22:05 theyueli