[BUG]prompt flow eval only supports UTF8 encoding

Open yanggaome opened this issue 1 year ago • 0 comments

Describe the bug A clear and concise description of the bug.

Looking at the evaluation code, it only supports the default UTF-8 encoding:

initial_data_df = pd.read_json(data, lines=True)

https://github.com/microsoft/promptflow/blob/main/src/promptflow-evals/promptflow/evals/evaluate/_evaluate.py#L115

Tested with a jsonL file with encoding='utf-8-sig', it errored out (as expected). Some of our clients are using multilingual input data, and they said 'utf-8-sig' is the only working encoding for them when they just use native promptflow. This seems a gap between promptflow and eval SDK.

Traceback (most recent call last):
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 115, in _validate_and_load_data
    initial_data_df = pd.read_json(data, lines=True)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 815, in read_json
    return json_reader.read()
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1023, in read
    obj = self._get_object_parser(self._combine_lines(data_lines))
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1051, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1187, in parse
    self._parse()
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1403, in _parse
    ujson_loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
    results = evaluate(
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 138, in wrapper
    result = func(*args, **kwargs)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 381, in evaluate
    raise e
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 356, in evaluate
    return _evaluate(
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 395, in _evaluate
    input_data_df = _validate_and_load_data(target, data, evaluators, output_path, azure_ai_project, evaluation_name)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 117, in _validate_and_load_data
    raise ValueError(
ValueError: Failed to load data from /xxx/test_data.jsonl. Please validate it is a valid jsonl data. Error: Expected object or value.

How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug: 1.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Running Information(please complete the following information):

Promptflow Package Version using pf -v: [e.g. 0.0.102309906]
Operating System: [e.g. Ubuntu 20.04, Windows 11]
Python Version using python --version: [e.g. python==3.10.12]

{ "promptflow": "1.15.0", "promptflow-azure": "1.15.0", "promptflow-core": "1.15.0", "promptflow-devkit": "1.15.0", "promptflow-evals": "0.3.2", "promptflow-tracing": "1.15.0" }

Executable '/anaconda/envs/azureml_py38/bin/python' Python (Linux) 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21) [GCC 12.3.0]

Additional context Add any other context about the problem here.

Aug 16 '24 16:08 yanggaome