promptflow icon indicating copy to clipboard operation
promptflow copied to clipboard

[BUG] 'Connection aborted.', RemoteDisconnected('Remote end closed connection without response') while running promptflow eval API

Open yanggaome opened this issue 1 year ago • 2 comments

Describe the bug A clear and concise description of the bug.

we keep getting this connection aborted issue, this seems particularly frequent when input dataset is large, our case has 200+ lines of input dataset. This is currently a blocker for us. Can you help looking into this?

2024-07-24T15:32:41.7332523Z     results = evaluate(
2024-07-24T15:32:41.7333109Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 111, in wrapper
2024-07-24T15:32:41.7333350Z     result = func(*args, **kwargs)
2024-07-24T15:32:41.7335748Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 365, in evaluate
2024-07-24T15:32:41.7336035Z     raise e
2024-07-24T15:32:41.7336390Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 340, in evaluate
2024-07-24T15:32:41.7336605Z     return _evaluate(
2024-07-24T15:32:41.7336960Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 401, in _evaluate
2024-07-24T15:32:41.7337239Z     input_data_df, target_generated_columns, target_run = _apply_target_to_data(
2024-07-24T15:32:41.7339201Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 183, in _apply_target_to_data
2024-07-24T15:32:41.7339603Z     run = pf_client.run(
2024-07-24T15:32:41.7340000Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 301, in run
2024-07-24T15:32:41.7340235Z     return self._run(
2024-07-24T15:32:41.7340578Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 226, in _run
2024-07-24T15:32:41.7340847Z     return self.runs.create_or_update(run=run, **kwargs)
2024-07-24T15:32:41.7341261Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
2024-07-24T15:32:41.7341612Z     return f(self, *args, **kwargs)
2024-07-24T15:32:41.7342293Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 135, in create_or_update
2024-07-24T15:32:41.7342899Z     created_run = RunSubmitter(client=self._client).submit(run=run, **kwargs)
2024-07-24T15:32:41.7343348Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 58, in submit
2024-07-24T15:32:41.7343661Z     portal_url = pfazure_client.runs._upload(run=run, run_uploader=uploader)
2024-07-24T15:32:41.7344231Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/operations/_run_operations.py", line 970, in _upload
2024-07-24T15:32:41.7344664Z     self._register_existing_bulk_run(run=run)
2024-07-24T15:32:41.7345074Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/operations/_run_operations.py", line 986, in _register_existing_bulk_run
2024-07-24T15:32:41.7345332Z     self._service_caller.create_existing_bulk_run(
2024-07-24T15:32:41.7345688Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/_restclient/flow_service_caller.py", line 74, in wrapper
2024-07-24T15:32:41.7345919Z     return func(self, *args, **kwargs)
2024-07-24T15:32:41.7346296Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/_restclient/flow_service_caller.py", line 744, in create_existing_bulk_run
2024-07-24T15:32:41.7346562Z     return self.caller.bulk_runs.create_existing_bulk_run(
2024-07-24T15:32:41.7346913Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
2024-07-24T15:32:41.7347123Z     return func(*args, **kwargs)
2024-07-24T15:32:41.7347540Z   File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/azure/_restclient/flow/operations/_bulk_runs_operations.py", line 631, in create_existing_bulk_run
2024-07-24T15:32:41.7347843Z     pipeline_response = self._client._pipeline.run(request, stream=False, **kwargs)
2024-07-24T15:32:41.7348188Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 229, in run
2024-07-24T15:32:41.7348405Z     return first_node.send(pipeline_request)
2024-07-24T15:32:41.7348719Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7348928Z     response = self.next.send(request)
2024-07-24T15:32:41.7349233Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7349439Z     response = self.next.send(request)
2024-07-24T15:32:41.7349731Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7349939Z     response = self.next.send(request)
2024-07-24T15:32:41.7350100Z   [Previous line repeated 2 more times]
2024-07-24T15:32:41.7350421Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/policies/_redirect.py", line 197, in send
2024-07-24T15:32:41.7350633Z     response = self.next.send(request)
2024-07-24T15:32:41.7350948Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/policies/_retry.py", line 554, in send
2024-07-24T15:32:41.7351261Z     raise err
2024-07-24T15:32:41.7351563Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/policies/_retry.py", line 532, in send
2024-07-24T15:32:41.7351745Z     response = self.next.send(request)
2024-07-24T15:32:41.7352023Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7352206Z     response = self.next.send(request)
2024-07-24T15:32:41.7352482Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7352664Z     response = self.next.send(request)
2024-07-24T15:32:41.7352933Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 86, in send
2024-07-24T15:32:41.7353114Z     response = self.next.send(request)
2024-07-24T15:32:41.7353253Z   [Previous line repeated 2 more times]
2024-07-24T15:32:41.7353523Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 118, in send
2024-07-24T15:32:41.7353818Z     self._sender.send(request.http_request, **request.context.options),
2024-07-24T15:32:41.7354154Z   File "/usr/local/miniconda/lib/python3.9/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 401, in send
2024-07-24T15:32:41.7354338Z     raise error
2024-07-24T15:32:41.7354618Z azure.core.exceptions.ServiceResponseError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug: 1.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Running Information(please complete the following information):

  • Promptflow Package Version using pf -v: [e.g. 0.0.102309906]
  • Operating System: [e.g. Ubuntu 20.04, Windows 11]
  • Python Version using python --version: [e.g. python==3.10.12]

{ "promptflow": "1.13.0", "promptflow-azure": "1.13.0", "promptflow-core": "1.13.0", "promptflow-devkit": "1.13.0", "promptflow-evals": "0.3.1", "promptflow-tracing": "1.13.0" }

Executable '/anaconda/envs/azureml_py38/bin/python' Python (Linux) 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21) [GCC 12.3.0]

Additional context Add any other context about the problem here.

yanggaome avatar Jul 25 '24 14:07 yanggaome

This looks like a service side timeout. @wangchao1230, who can help from the PF service side? @yanggaome, can you also check if your local source folders are being uploaded unnecessarily? Ideally, only the evaluation result jsonl file should be uploaded.

luigiw avatar Jul 25 '24 19:07 luigiw

@luigiw The customer has contacted me via teams, I got workspace info and run names and asked server folks to help investigate. But no logs was found for the particular run names in that workspace on server side ingress.

This result indicates that these requests were never sent to server. Since this is happening in an automated environment(non-local), I'm asking the customer to schedule a meeting so we can access the environemnt to repro and see if we can get more logs or actual request ids to help the investigation.

0mza987 avatar Jul 26 '24 07:07 0mza987