kafka-connect-bigquery
kafka-connect-bigquery copied to clipboard
GCSToBQLoadRunnable does not detect error during load and removes blobs even though they were not loaded
How it was discovered?
I have found out that some data is missing from table, even though records were present in Kafka.
How that happens?
GCSToBQLoadRunnable checks if job is complete here, but in fact it does not check for a potential error during load. As a result the job is treated as successful even tough it was failed.
Evidence
[2022-12-22 21:31:30,663] TRACE Job is marked done: id=JobId{project=archimedes-337602, job=7d28940e-7882-414b-9bb3-c6c8c7e1307c, location=us-west1}, status=JobStatus{state=DONE, error=BigQueryError{reason=quotaExceeded, location=partition_modifications_per_column_partitioned_table.long, message=Quota exceeded: Your table exceeded quota for Number of partition modifications to a column partitioned table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas}, executionErrors=[BigQueryError{reason=quotaExceeded, location=partition_modifications_per_column_partitioned_table.long, message=Quota exceeded: Your table exceeded quota for Number of partition modifications to a column partitioned table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas}]} (com.wepay.kafka.connect.bigquery.GCSToBQLoadRunnable)
As we can see that state is DONE, but there was an error related to quotas.