[Bug]: _metadata_big_query_commit_timestamp is not populated when useStorageWriteApi=true in Spanner to BigQuery template
Related Template(s)
Spanner_Change_Streams_to_BigQuery
Template Version
Latest version from Google Cloud Console (Flex Template)
What happened?
I'm using the Cloud Spanner change streams to BigQuery template as described in the documentation: https://cloud.google.com/dataflow/docs/guides/templates/provided/cloud-spanner-change-streams-to-bigquery
My pipeline is configured with --useStorageWriteApi=true.
The documentation mentions the following:
_metadata_big_query_commit_timestamp: The commit timestamp when the row is inserted into BigQuery. If useStorageWriteApi is true, this column is not automatically created in the changelog table by the pipeline. In that case, you must manually add this column in the changelog table if needed.
- I manually added the
_metadata_big_query_commit_timestampcolumn to the changelog table schema (as aTIMESTAMP). - The pipeline is running correctly and writing rows to BigQuery.
- However, the
_metadata_big_query_commit_timestampfield is never populated — it remainsNULLin all rows.
🔍 Question:
Is this field supposed to be automatically populated by the template or by BigQuery Storage Write API when useStorageWriteApi=true?
If not — what is the recommended way to populate this value?
Thank you!
Relevant log output
@ruslan-yuldashev-sxp , when using --useStorageWriteApi=true , we need to manually add the _metadata_big_query_commit_timestamp column and set CURRENT_TIMESTAMP as its default value if we need the column.
So, if you have already created column _metadata_big_query_commit_timestamp manually, what you can further do is :
ALTER TABLE {PROJECT_ID}.{DATASET_ID}.{CHANGELOG_TABLE} ALTER COLUMN _metadata_big_query_commit_timestamp SET DEFAULT CURRENT_TIMESTAMP()
After this, _metadata_big_query_commit_timestamp should be populated. Can you please confirm the same ?
I'll check if we can improve public documentation.
Tanu, please check this. Thanks.