confluent-hybrid-cloud-workshop icon indicating copy to clipboard operation
confluent-hybrid-cloud-workshop copied to clipboard

Customers, Products, and Suppliers do not flow to Google Big Query

Open chadmott opened this issue 5 years ago • 2 comments

At the end of the lab, I see transactional data in Big Query, but not the customers, products, or suppliers data.

In my local confluent control center, I see this data in their respective topics, and CC is properly showing their values (so it is schema-aware)

In confluent cloud (which is where the connector is configured to pull from) I see the data, but for the values i see the binary representation of the AVRO encoded data. I suspect that for whatever reason, the confluent cloud cluster is unable to deserialize the data?

The connector is running


Name                 : DC01_GCS_SINK
Class                : io.confluent.connect.gcs.GcsSinkConnector
Type                 : sink
State                : RUNNING
WorkerId             : kafka-connect-ccloud:18084

 Task ID | State   | Error Trace
---------------------------------
 0       | RUNNING |
---------------------------------

with no errors.

Could you comment on why dont I see any errors? How can I view messages that the connector "skipped" when running in KSQL mode?

chadmott avatar Aug 19 '20 14:08 chadmott

@tmcgrath I suspect this is why you were getting the error in Data Studio... the queries are joining on IDs that (at least for me) do not exist

chadmott avatar Aug 19 '20 14:08 chadmott

Quick update --- adding the ID fields to the tables in BigQuery, and then re-starting the connector has data flowing in. Seems for whatever reason the ID does not exist in the Schemas...

{
  "connect.name": "io.confluent.ksql.avro_schemas.KsqlDataSourceSchema",
  "fields": [
    {
      "default": null,
      "name": "FIRST_NAME",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "LAST_NAME",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "EMAIL",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "CITY",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "COUNTRY",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "SOURCEDC",
      "type": [
        "null",
        "string"
      ]
    }
  ],
  "name": "KsqlDataSourceSchema",
  "namespace": "io.confluent.ksql.avro_schemas",
  "type": "record"
}

if there is no schema here, it makes sense why it wouldn't show up in BigQuery but... why then did the data flow after I manually added the ID to big query?

chadmott avatar Aug 19 '20 14:08 chadmott