promptflow icon indicating copy to clipboard operation
promptflow copied to clipboard

[BUG] Different evaluation flow behavior locally and in Azure

Open jomalsan opened this issue 7 months ago • 1 comments

Describe the bug Let us have two datasets D1 and D2 that are different. Then, I will have two flows: Ex, experiment which takes a dataset as input Ex(D) and Ev, evaluation which takes a dataset and an evaluation flow as input Ev(D, Ex(D)).

In this setup, I noticed that:

  1. Locally, Ev(D2, Ex(D1)) throws an error and does not run. This is the behavior I would expect
  2. In Azure, Ev(D2, Ex(D1)) runs and shows success, however, the results are meaningless because the data inputs pulled from D2, while the run output

I think there are also behaviors to look at for if there are the same or different numbers of data points, but I didn't dig too far into it

How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug:

  1. Create two datasets with different entries
  2. Create an flow and an evaluation flow
  3. Locally, a. run the flow on the first dataset: pf run create b. run the evaluation flow against the second dataset and the output of the first run pf run create --run {} c. The above will raise an exception
  4. In Azure, a. run the flow on the first dataset: pfazure run create b. run the evaluation flow against the second dataset and the output of the first run pfazure run create --run {} c. The above will run successfully

Expected behavior I would expect both to raise an exception if running on different data to prevent unintended results.

Running Information(please complete the following information): { "promptflow": "1.13.0", "promptflow-azure": "1.12.0", "promptflow-core": "1.13.0", "promptflow-devkit": "1.13.0", "promptflow-tracing": "1.13.0" }

Executable '/**/.venv/bin/python3.11' Python (Darwin) 3.11.4 (v3.11.4:d2340ef257, Jun 6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]

jomalsan avatar Jul 22 '24 22:07 jomalsan