yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[BUG] MLFlow does not support a predictions DataFrame with a MultiIndex

Open varunsridhar1 opened this issue 2 years ago • 5 comments

I am using MLFlow to deploy a model that returns a pd.DataFrame with a pd.MultiIndex. Whenever I run the MLFlow wrapper to predict, I see this error that comes from calling json.dump on the MultiIndex DataFrame:

mlflow models predict -m example_model -i data.json -t json --env-manager local 2>&1

[{Traceback (most recent call last): File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/bin/mlflow", line 11, in <module> sys.exit(cli()) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/models/cli.py", line 125, in predict return _get_flavor_backend( File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/pyfunc/backend.py", line 137, in predict scoring_server._predict(local_uri, input_path, output_path, content_type, json_format) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/pyfunc/scoring_server/__init__.py", line 345, in _predict predictions_to_json(pyfunc_model.predict(df), sys.stdout) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/pyfunc/scoring_server/__init__.py", line 193, in predictions_to_json json.dump(predictions, output, cls=NumpyEncoder) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/__init__.py", line 179, in dump for chunk in iterable: File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/encoder.py", line 429, in _iterencode yield from _iterencode_list(o, _current_indent_level) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/encoder.py", line 325, in _iterencode_list yield from chunks File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/encoder.py", line 376, in _iterencode_dict raise TypeError(f'keys must be str, int, float, bool or None, ' TypeError: keys must be str, int, float, bool or None, not tuple

I believe this comes from using the predictions_to_json function, which converts a MultiIndex DataFrame into a dictionary like this: [{('top_index_1', 'a'): 10.0, ('top_index_1', 'b'): 5.0, ('top_index_2', 'c'): 15.0, ('top_index_3', 'd'): 20.0}]

The keys here are tuples, which results in the error above.

varunsridhar1 avatar Jun 29 '22 18:06 varunsridhar1

@arjundc-db Would you be able to take a look here? cc also @WeichenXu123

dbczumar avatar Jun 29 '22 23:06 dbczumar

I suggest before calling model.predict, we drop dataframe index first. @dbczumar what do you think ? Some sklearn model (e.g. LinearRegresion) predict routine will also drop the index and return an array as result.

WeichenXu123 avatar Jun 30 '22 09:06 WeichenXu123

To simplify the issue, I propose to always convert the predict result as an array and then return it as a json list.

WeichenXu123 avatar Jun 30 '22 13:06 WeichenXu123

To simplify the issue, I propose to always convert the predict result as an array and then return it as a json list.

We need to make sure that the proposed solution is backwards-compatible. @arjundc-db @sueann @yunpark93 Can one of you take a look at this and propose a solution?

dbczumar avatar Jul 01 '22 00:07 dbczumar

@arjundc-db @sueann @yunpark93 Pinging here. Can you take a look?

dbczumar avatar Jul 07 '22 00:07 dbczumar