kafka-connect-bigquery
kafka-connect-bigquery copied to clipboard
Add a config to set all columns to nullable
KCBQ currently translates the Kafka connect required/nullable settings for fields into BigQuery fields as-is. That is, if a field is required in Kafka connect's schema, it will be REQUIRED in BigQuery. The same is true for NULLABLE fields.
It would be useful to be able to tell KCBQ to ignore these settings and always set BigQuery fields to have NULLABLE for all fields. This will give us a bit more flexibility in the BigQuery pipeline, since it's difficult to migrate tables from REQUIRED to NULLABLE in BigQuery, and upstream systems occasionally want to do this. I acknowledge that when upstream systems do this, it's an incompatible operation for downstream consumers, but in cases where KCBQ is the only consumer, making it a bit more resilient to the change (vs. having to re-bootstrap the data to change the schema).
I suggest exposing this option via a config. To make things compatible for existing connectors that are already running, it would be nice if the config worked such that it gracefully only set NULLABLE for all fields that it doesn't already know about. That is, if a table already exists in BQ with required fields, it leaves those as they are, but any new fields that are added are always nullable.
We recently had an issue with this, having a source schema go from nullable to required kind of breaks in the case when using something like CDC via Debezium, because you can't update a nullable to required
I believe this was resolved by config parameter allBQFieldsNullable
at the time of the previous comment.