promptflow
promptflow copied to clipboard
[BUG] Different evaluation flow behavior locally and in Azure
Describe the bug Let us have two datasets D1 and D2 that are different. Then, I will have two flows: Ex, experiment which takes a dataset as input Ex(D) and Ev, evaluation which takes a dataset and an evaluation flow as input Ev(D, Ex(D)).
In this setup, I noticed that:
- Locally,
Ev(D2, Ex(D1))throws an error and does not run. This is the behavior I would expect - In Azure,
Ev(D2, Ex(D1))runs and shows success, however, the results are meaningless because the data inputs pulled from D2, while the run output
I think there are also behaviors to look at for if there are the same or different numbers of data points, but I didn't dig too far into it
How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug:
- Create two datasets with different entries
- Create an flow and an evaluation flow
- Locally,
a. run the flow on the first dataset:
pf run createb. run the evaluation flow against the second dataset and the output of the first runpf run create --run {}c. The above will raise an exception - In Azure,
a. run the flow on the first dataset:
pfazure run createb. run the evaluation flow against the second dataset and the output of the first runpfazure run create --run {}c. The above will run successfully
Expected behavior I would expect both to raise an exception if running on different data to prevent unintended results.
Running Information(please complete the following information): { "promptflow": "1.13.0", "promptflow-azure": "1.12.0", "promptflow-core": "1.13.0", "promptflow-devkit": "1.13.0", "promptflow-tracing": "1.13.0" }
Executable '/**/.venv/bin/python3.11' Python (Darwin) 3.11.4 (v3.11.4:d2340ef257, Jun 6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
Hey, thanks for reaching us. We just notice that you've also opened an ocv to describe this issue, we still missing some details about your problem, for example, have you specify the column_mapping, etc. It will be better if you could provide us a dummy reproduce flow, we will contact you via teams for more details about this, thanks again.
Created a OCV to track this.