promptflow
promptflow copied to clipboard
[BUG] Different evaluation flow behavior locally and in Azure
Describe the bug Let us have two datasets D1 and D2 that are different. Then, I will have two flows: Ex, experiment which takes a dataset as input Ex(D) and Ev, evaluation which takes a dataset and an evaluation flow as input Ev(D, Ex(D)).
In this setup, I noticed that:
- Locally,
Ev(D2, Ex(D1))
throws an error and does not run. This is the behavior I would expect - In Azure,
Ev(D2, Ex(D1))
runs and shows success, however, the results are meaningless because the data inputs pulled from D2, while the run output
I think there are also behaviors to look at for if there are the same or different numbers of data points, but I didn't dig too far into it
How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug:
- Create two datasets with different entries
- Create an flow and an evaluation flow
- Locally,
a. run the flow on the first dataset:
pf run create
b. run the evaluation flow against the second dataset and the output of the first runpf run create --run {}
c. The above will raise an exception - In Azure,
a. run the flow on the first dataset:
pfazure run create
b. run the evaluation flow against the second dataset and the output of the first runpfazure run create --run {}
c. The above will run successfully
Expected behavior I would expect both to raise an exception if running on different data to prevent unintended results.
Running Information(please complete the following information): { "promptflow": "1.13.0", "promptflow-azure": "1.12.0", "promptflow-core": "1.13.0", "promptflow-devkit": "1.13.0", "promptflow-tracing": "1.13.0" }
Executable '/**/.venv/bin/python3.11' Python (Darwin) 3.11.4 (v3.11.4:d2340ef257, Jun 6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]