[Bug]: Bigquery python streaming insertAll SSLError leads to stuck streaming job
What happened?
Since ~2 weeks my streaming job started getting this error regularly. It happened again today with latest beam (2.63.0) running on GCP Dataflow.
It makes the job retry the failed unit of work forever until it gives up and halt all processing, forcing me to re-deploy
My WriteToBigQuery config
WriteToBigQuery(table=self.table.table_id,
project=self.table.project,
dataset=self.table.dataset_id,
insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
write_disposition=BigQueryDisposition.WRITE_APPEND,
create_disposition=BigQueryDisposition.CREATE_NEVER,
method=WriteToBigQuery.Method.STREAMING_INSERTS,
batch_size=1900,
triggering_frequency=10,
with_auto_sharding=True,
).with_output_types(WriteResult)
Stacktrace:
Error message from worker: generic::unknown: urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:2427)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File \"/usr/local/lib/python3.11/site-packages/requests/adapters.py\", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py\", line 841, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py\", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Max retries exceeded with url: /bigquery/v2/projects/REDACTED/insertAll?prettyPrint=false (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py\", line 144, in retry_target
result = target()
^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/cloud/_http/__init__.py\", line 482, in api_request
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/cloud/_http/__init__.py\", line 341, in _make_request
return self._do_request(
^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/cloud/_http/__init__.py\", line 379, in _do_request
return self.http.request(
^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/auth/transport/requests.py\", line 537, in request
response = super(AuthorizedSession, self).request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/requests/sessions.py\", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/requests/sessions.py\", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/requests/adapters.py\", line 698, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Max retries exceeded with url: /bigquery/v2/projects/REDACTED/insertAll?prettyPrint=false (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File \"apache_beam/runners/common.py\", line 1501, in apache_beam.runners.common.DoFnRunner.process
File \"apache_beam/runners/common.py\", line 917, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File \"apache_beam/runners/common.py\", line 1061, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File \"/usr/local/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery.py\", line 1640, in process
return self._flush_batch(destination)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery.py\", line 1683, in _flush_batch
passed, errors = self.bigquery_wrapper.insert_rows(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery_tools.py\", line 1293, in insert_rows
result, errors = self._insert_all_rows(
^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/utils/retry.py\", line 311, in wrapper
raise exn.with_traceback(exn_traceback)
File \"/usr/local/lib/python3.11/site-packages/apache_beam/utils/retry.py\", line 298, in wrapper
return fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery_tools.py\", line 744, in _insert_all_rows
errors = self.gcp_bq_client.insert_rows_json(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/cloud/bigquery/client.py\", line 3889, in insert_rows_json
response = self._call_api(
^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/cloud/bigquery/client.py\", line 837, in _call_api
return call()
^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py\", line 293, in retry_wrapped_func
return retry_target(
^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py\", line 153, in retry_target
_retry_error_helper(
File \"/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_base.py\", line 221, in _retry_error_helper
raise final_exc from source_exc
google.api_core.exceptions.RetryError: Timeout of 600.0s exceeded, last exception: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Max retries exceeded with url: /bigquery/v2/projects/REDACTED/insertAll?prettyPrint=false (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py\", line 313, in _execute
response = task()
^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py\", line 388, in <lambda>
lambda: self.create_worker().do_instruction(request), request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py\", line 658, in do_instruction
return getattr(self, request_type)(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py\", line 696, in process_bundle
bundle_processor.process_bundle(instruction_id))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/bundle_processor.py\", line 1271, in process_bundle
self.ops[element.transform_id].process_timer(
File \"apache_beam/runners/worker/operations.py\", line 974, in apache_beam.runners.worker.operations.DoOperation.process_timer
File \"apache_beam/runners/common.py\", line 1553, in apache_beam.runners.common.DoFnRunner.process_user_timer
File \"apache_beam/runners/common.py\", line 1591, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File \"apache_beam/runners/common.py\", line 1550, in apache_beam.runners.common.DoFnRunner.process_user_timer
File \"apache_beam/runners/common.py\", line 645, in apache_beam.runners.common.DoFnInvoker.invoke_user_timer
File \"apache_beam/runners/common.py\", line 1686, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File \"apache_beam/runners/common.py\", line 1799, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File \"apache_beam/runners/worker/operations.py\", line 263, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File \"apache_beam/runners/worker/operations.py\", line 950, in apache_beam.runners.worker.operations.DoOperation.process
File \"apache_beam/runners/worker/operations.py\", line 951, in apache_beam.runners.worker.operations.DoOperation.process
File \"apache_beam/runners/common.py\", line 1503, in apache_beam.runners.common.DoFnRunner.process
File \"apache_beam/runners/common.py\", line 1591, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File \"apache_beam/runners/common.py\", line 1501, in apache_beam.runners.common.DoFnRunner.process
File \"apache_beam/runners/common.py\", line 689, in apache_beam.runners.common.SimpleInvoker.invoke_process
File \"apache_beam/runners/common.py\", line 1686, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File \"apache_beam/runners/common.py\", line 1799, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File \"apache_beam/runners/worker/operations.py\", line 263, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File \"apache_beam/runners/worker/operations.py\", line 950, in apache_beam.runners.worker.operations.DoOperation.process
File \"apache_beam/runners/worker/operations.py\", line 951, in apache_beam.runners.worker.operations.DoOperation.process
File \"apache_beam/runners/common.py\", line 1503, in apache_beam.runners.common.DoFnRunner.process
File \"apache_beam/runners/common.py\", line 1591, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File \"apache_beam/runners/common.py\", line 1501, in apache_beam.runners.common.DoFnRunner.process
File \"apache_beam/runners/common.py\", line 689, in apache_beam.runners.common.SimpleInvoker.invoke_process
File \"apache_beam/runners/common.py\", line 1686, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File \"apache_beam/runners/common.py\", line 1799, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File \"apache_beam/runners/worker/operations.py\", line 263, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File \"apache_beam/runners/worker/operations.py\", line 950, in apache_beam.runners.worker.operations.DoOperation.process
File \"apache_beam/runners/worker/operations.py\", line 951, in apache_beam.runners.worker.operations.DoOperation.process
File \"apache_beam/runners/common.py\", line 1503, in apache_beam.runners.common.DoFnRunner.process
File \"apache_beam/runners/common.py\", line 1612, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File \"apache_beam/runners/common.py\", line 1501, in apache_beam.runners.common.DoFnRunner.process
File \"apache_beam/runners/common.py\", line 917, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File \"apache_beam/runners/common.py\", line 1061, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File \"/usr/local/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery.py\", line 1640, in process
return self._flush_batch(destination)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery.py\", line 1683, in _flush_batch
passed, errors = self.bigquery_wrapper.insert_rows(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery_tools.py\", line 1293, in insert_rows
result, errors = self._insert_all_rows(
^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/utils/retry.py\", line 311, in wrapper
raise exn.with_traceback(exn_traceback)
File \"/usr/local/lib/python3.11/site-packages/apache_beam/utils/retry.py\", line 298, in wrapper
return fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery_tools.py\", line 744, in _insert_all_rows
errors = self.gcp_bq_client.insert_rows_json(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/cloud/bigquery/client.py\", line 3889, in insert_rows_json
response = self._call_api(
^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/cloud/bigquery/client.py\", line 837, in _call_api
return call()
^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py\", line 293, in retry_wrapped_func
return retry_target(
^^^^^^^^^^^^^
File \"/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py\", line 153, in retry_target
_retry_error_helper(
File \"/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_base.py\", line 221, in _retry_error_helper
raise final_exc from source_exc
RuntimeError: google.api_core.exceptions.RetryError: Timeout of 600.0s exceeded, last exception: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Max retries exceeded with url: /bigquery/v2/projects/REDACTED/insertAll?prettyPrint=false (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)'))) [while running 'sessions to bq/WriteToBQ/_StreamToBigQuery/StreamInsertRows/ParDo(BigQueryWriteFn)-ptransform-82']
passed through:
==>
dist_proc/dax/workflow/worker/fnapi_service_impl.cc:1331
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- [x] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [x] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [x] Component: Google Cloud Dataflow Runner
Can you open a Google Cloud Support ticket?
I do not have access to Google cloud support unfortunately
Can you share the Python packages with versions with your env? And are these errors transient? Dataflow should always retry these errors.
batch_size=1900,: is your data somehow much larger than before?
Might be related to https://github.com/googleapis/python-bigquery/issues/1570
This is the python env:
packages
SecretStorage>=3.2 = 3.3.3
annotated-types>=0.6.0 = 0.7.0
apache-beam==2.63.0 = 2.63.0
attrs>=22.2.0 = 24.3.0
backports.tarfile = 1.2.0
cachetools<6,>=3.1.0 = 5.5.0
certifi>=2017.4.17 = 2024.12.14
cffi>=1.12 = 1.17.1
charset-normalizer<4,>=2 = 3.4.1
cloudpickle~=2.2.1 = 2.2.1
crcmod<2.0,>=1.7 = 1.7
cryptography>=2.0 = 44.0.0
deprecated>=1.2.6 = 1.2.15
dill<0.3.2,>=0.3.1.1 = 0.3.1.1
dnspython<3.0.0,>=1.16.0 = 2.7.0
docopt = 0.6.2
docstring-parser<1 = 0.16
fastavro<2,>=0.23.6 = 1.10.0
fasteners<1.0,>=0.3 = 0.19
google-api-core<3,>=2.0.0 = 2.24.0
google-apitools<0.5.32,>=0.5.31 = 0.5.31
google-auth-httplib2<0.3.0,>=0.1.0 = 0.2.0
google-auth<3,>=1.18.0 = 2.37.0
google-cloud-aiplatform<2.0,>=1.26.0 = 1.77.0
google-cloud-bigquery-storage<3,>=2.6.3 = 2.27.0
google-cloud-bigquery<4,>=2.0.0 = 3.27.0
google-cloud-bigtable<3,>=2.19.0 = 2.28.1
google-cloud-core<3,>=2.0.0 = 2.4.1
google-cloud-datastore<3,>=2.0.0 = 2.20.2
google-cloud-dlp<4,>=3.0.0 = 3.26.0
google-cloud-language<3,>=2.0 = 2.16.0
google-cloud-pubsub<3,>=2.1.0 = 2.27.2
google-cloud-pubsublite<2,>=1.2.0 = 1.11.1
google-cloud-recommendations-ai<0.11.0,>=0.1.0 = 0.10.15
google-cloud-resource-manager<3.0.0dev,>=1.3.3 = 1.14.0
google-cloud-spanner<4,>=3.0.0 = 3.51.0
google-cloud-storage<3,>=2.18.2 = 2.19.0
google-cloud-videointelligence<3,>=2.0 = 2.15.0
google-cloud-vision<4,>=2 = 3.9.0
google-crc32c<2.0dev,>=1.0 = 1.6.0
google-resumable-media<3.0dev,>=2.0.0 = 2.7.2
googleapis-common-protos<2.0.dev0,>=1.56.2 = 1.66.0
grpc-google-iam-v1<1.0.0dev,>=0.12.4 = 0.14.0
grpc-interceptor>=0.15.4 = 0.15.4
grpcio!=1.48.0,!=1.59.*,!=1.60.*,!=1.61.*,!=1.62.0,!=1.62.1,<1.66.0,<2,>=1.33.1 = 1.65.5
grpcio-status>=1.33.2 = 1.65.5
hdfs<3.0.0,>=2.1.0 = 2.7.3
httplib2<0.23.0,>=0.8 = 0.22.0
idna<4,>=2.5 = 3.10
importlib-metadata<=8.5.0,>=6.0 = 8.5.0
jaraco.classes = 3.4.0
jaraco.context = 6.0.1
jaraco.functools = 4.1.0
jeepney>=0.4.2 = 0.8.0
jsonpickle<4.0.0,>=3.0.0 = 3.4.2
jsonschema-specifications>=2023.03.6 = 2024.10.1
jsonschema<5.0.0,>=4.0.0 = 4.23.0
keyring = 25.6.0
keyrings.google-artifactregistry-auth = 1.1.2
more-itertools = 10.6.0
msgpack = 1.1.0
numpy<2.3.0,>=1.14.3 = 2.2.1
oauth2client>=1.4.12 = 4.1.3
objsize<0.8.0,>=0.6.1 = 0.7.0
opentelemetry-api>=1.27.0 = 1.29.0
opentelemetry-sdk>=1.27.0 = 1.29.0
opentelemetry-semantic-conventions==0.50b0 = 0.50b0
orjson<4,>=3.9.7 = 3.10.14
overrides<8.0.0,>=6.0.1 = 7.7.0
packaging>=22.0 = 24.2
pluggy = 1.5.0
proto-plus<2,>=1.7.1 = 1.25.0
protobuf!=4.0.*,!=4.21.*,!=4.22.0,!=4.23.*,!=4.24.*,<6.0.0.dev0,>=3.20.3 = 5.29.3
pyarrow-hotfix<1 = 0.6
pyarrow<17.0.0,>=3.0.0 = 16.1.0
pyasn1-modules>=0.2.1 = 0.4.1
pyasn1>=0.1.7 = 0.6.1
pycparser = 2.22
pydantic-core==2.27.2 = 2.27.2
pydantic<3 = 2.10.5
pydot<2,>=1.2.0 = 1.4.2
pymongo<5.0.0,>=3.8.0 = 4.10.1
pyparsing!=3.0.0,!=3.0.1,!=3.0.2,!=3.0.3,<4,>=2.4.2 = 3.2.1
python-dateutil = 2.9.0.post0
pytz>=2018.3 = 2024.2
pyyaml<7.0.0,>=3.12 = 6.0.2
redis<6,>=5.0.0 = 5.2.1
referencing>=0.28.4 = 0.36.1
regex>=2020.6.8 = 2024.11.6
requests<3.0.0,>=2.24.0 = 2.32.3
rpds-py>=0.7.1 = 0.22.3
rsa<5,>=3.1.4 = 4.9
shapely<3.0.0dev = 2.0.6
six>=1.5 = 1.17.0
sortedcontainers>=2.4.0 = 2.4.0
sqlparse>=0.4.4 = 0.5.3
typing-extensions>=3.7.0 = 4.12.2
urllib3<3,>=1.21.1 = 2.3.0
wrapt<2,>=1.10 = 1.17.2
zipp>=3.20 = 3.21.0
zstandard<1,>=0.18.0 = 0.23.0
Running with Apache Beam Python 3.11 SDK 2.63.0 in streaming mode
And are these errors transient? Dataflow should always retry these errors.
That's the thing that puzzles me. They get retried until they reach the maximum retry time then they are re-thrown. So they're handled by the google api core package (see the retry_wrapped_func call in the stacktrace) but it seems after it bubbles up to the beam bigquery error handling its not considered a valid error to catch and it cause a infinite error loop for the bundle
Have you tried to use the smaller batch_size?
No I haven't. I've already seen how beam handles those errors in the past (bigquery returns a dedicated 413 (Request Entity Too Large) error) so he doesn't seem to be what's happening here. I'll try to lower it tho to see if it improves
I used a batch size of 900 and the issue occurred again today. same stacktrace down to the same line
Retry with exponential backoff: waiting for 7.223288977709672 seconds before retrying _insert_all_rows because we caught exception:
google.api_core.exceptions.RetryError: Timeout of 600.0s exceeded, last exception:
HTTPSConnectionPool(host='bigquery.googleapis.com', port=443):
Max retries exceeded with url:
/bigquery/v2/projects/PROJECTID/datasets/stats/tables/TABLEID/insertAll?prettyPrint=false
(Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
Can you try Python 3.9?
yeah, anything particular with it? why not 3.10 or 3.12? It's currently on 3.11
https://github.com/python/cpython/issues/95031 might be related here.
I did some more research and ended up making a PR for this: https://github.com/apache/beam/pull/35212
@quentin-sommer We are facing the same issue here. Do you have any update on this? How have you workaround the issue?
my workaround was to be more careful with the data I sent. Now hopefully this will be fixed
Thanks, @quentin-sommer, for your contribution!