[BUG] 'Connection aborted.', RemoteDisconnected('Remote end closed connection without response') while running promptflow eval API
Describe the bug A clear and concise description of the bug.
we keep getting this connection aborted issue, this seems particularly frequent when input dataset is large, our case has 200+ lines of input dataset. This is currently a blocker for us. Can you help looking into this?
2024-07-24T15:32:41.7332523Z results = evaluate(
2024-07-24T15:32:41.7333109Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 111, in wrapper
2024-07-24T15:32:41.7333350Z result = func(*args, **kwargs)
2024-07-24T15:32:41.7335748Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 365, in evaluate
2024-07-24T15:32:41.7336035Z raise e
2024-07-24T15:32:41.7336390Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 340, in evaluate
2024-07-24T15:32:41.7336605Z return _evaluate(
2024-07-24T15:32:41.7336960Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 401, in _evaluate
2024-07-24T15:32:41.7337239Z input_data_df, target_generated_columns, target_run = _apply_target_to_data(
2024-07-24T15:32:41.7339201Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 183, in _apply_target_to_data
2024-07-24T15:32:41.7339603Z run = pf_client.run(
2024-07-24T15:32:41.7340000Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 301, in run
2024-07-24T15:32:41.7340235Z return self._run(
2024-07-24T15:32:41.7340578Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 226, in _run
2024-07-24T15:32:41.7340847Z return self.runs.create_or_update(run=run, **kwargs)
2024-07-24T15:32:41.7341261Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
2024-07-24T15:32:41.7341612Z return f(self, *args, **kwargs)
2024-07-24T15:32:41.7342293Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 135, in create_or_update
2024-07-24T15:32:41.7342899Z created_run = RunSubmitter(client=self._client).submit(run=run, **kwargs)
2024-07-24T15:32:41.7343348Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 58, in submit
2024-07-24T15:32:41.7343661Z portal_url = pfazure_client.runs._upload(run=run, run_uploader=uploader)
2024-07-24T15:32:41.7344231Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/operations/_run_operations.py", line 970, in _upload
2024-07-24T15:32:41.7344664Z self._register_existing_bulk_run(run=run)
2024-07-24T15:32:41.7345074Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/operations/_run_operations.py", line 986, in _register_existing_bulk_run
2024-07-24T15:32:41.7345332Z self._service_caller.create_existing_bulk_run(
2024-07-24T15:32:41.7345688Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/_restclient/flow_service_caller.py", line 74, in wrapper
2024-07-24T15:32:41.7345919Z return func(self, *args, **kwargs)
2024-07-24T15:32:41.7346296Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/_restclient/flow_service_caller.py", line 744, in create_existing_bulk_run
2024-07-24T15:32:41.7346562Z return self.caller.bulk_runs.create_existing_bulk_run(
2024-07-24T15:32:41.7346913Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
2024-07-24T15:32:41.7347123Z return func(*args, **kwargs)
2024-07-24T15:32:41.7347540Z File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/_restclient/flow/operations/_bulk_runs_operations.py", line 631, in create_existing_bulk_run
2024-07-24T15:32:41.7347843Z pipeline_response = self._client._pipeline.run(request, stream=False, **kwargs)
2024-07-24T15:32:41.7348188Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 229, in run
2024-07-24T15:32:41.7348405Z return first_node.send(pipeline_request)
2024-07-24T15:32:41.7348719Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7348928Z response = self.next.send(request)
2024-07-24T15:32:41.7349233Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7349439Z response = self.next.send(request)
2024-07-24T15:32:41.7349731Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7349939Z response = self.next.send(request)
2024-07-24T15:32:41.7350100Z [Previous line repeated 2 more times]
2024-07-24T15:32:41.7350421Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/policies/_redirect.py", line 197, in send
2024-07-24T15:32:41.7350633Z response = self.next.send(request)
2024-07-24T15:32:41.7350948Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/policies/_retry.py", line 554, in send
2024-07-24T15:32:41.7351261Z raise err
2024-07-24T15:32:41.7351563Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/policies/_retry.py", line 532, in send
2024-07-24T15:32:41.7351745Z response = self.next.send(request)
2024-07-24T15:32:41.7352023Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7352206Z response = self.next.send(request)
2024-07-24T15:32:41.7352482Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7352664Z response = self.next.send(request)
2024-07-24T15:32:41.7352933Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7353114Z response = self.next.send(request)
2024-07-24T15:32:41.7353253Z [Previous line repeated 2 more times]
2024-07-24T15:32:41.7353523Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 118, in send
2024-07-24T15:32:41.7353818Z self._sender.send(request.http_request, **request.context.options),
2024-07-24T15:32:41.7354154Z File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 401, in send
2024-07-24T15:32:41.7354338Z raise error
2024-07-24T15:32:41.7354618Z azure.core.exceptions.ServiceResponseError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug: 1.
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Running Information(please complete the following information):
- Promptflow Package Version using
pf -v: [e.g. 0.0.102309906] - Operating System: [e.g. Ubuntu 20.04, Windows 11]
- Python Version using
python --version: [e.g. python==3.10.12]
{ "promptflow": "1.13.0", "promptflow-azure": "1.13.0", "promptflow-core": "1.13.0", "promptflow-devkit": "1.13.0", "promptflow-evals": "0.3.1", "promptflow-tracing": "1.13.0" }
Executable '/anaconda/envs/azureml_py38/bin/python' Python (Linux) 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21) [GCC 12.3.0]
Additional context Add any other context about the problem here.
This looks like a service side timeout. @wangchao1230, who can help from the PF service side? @yanggaome, can you also check if your local source folders are being uploaded unnecessarily? Ideally, only the evaluation result jsonl file should be uploaded.
@luigiw The customer has contacted me via teams, I got workspace info and run names and asked server folks to help investigate. But no logs was found for the particular run names in that workspace on server side ingress.
This result indicates that these requests were never sent to server. Since this is happening in an automated environment(non-local), I'm asking the customer to schedule a meeting so we can access the environemnt to repro and see if we can get more logs or actual request ids to help the investigation.