yocto-gl
yocto-gl copied to clipboard
[BUG] MLFlow does not support a predictions DataFrame with a MultiIndex
I am using MLFlow to deploy a model that returns a pd.DataFrame with a pd.MultiIndex. Whenever I run the MLFlow wrapper to predict, I see this error that comes from calling json.dump
on the MultiIndex DataFrame:
mlflow models predict -m example_model -i data.json -t json --env-manager local 2>&1
[{Traceback (most recent call last): File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/bin/mlflow", line 11, in <module> sys.exit(cli()) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/models/cli.py", line 125, in predict return _get_flavor_backend( File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/pyfunc/backend.py", line 137, in predict scoring_server._predict(local_uri, input_path, output_path, content_type, json_format) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/pyfunc/scoring_server/__init__.py", line 345, in _predict predictions_to_json(pyfunc_model.predict(df), sys.stdout) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/pyfunc/scoring_server/__init__.py", line 193, in predictions_to_json json.dump(predictions, output, cls=NumpyEncoder) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/__init__.py", line 179, in dump for chunk in iterable: File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/encoder.py", line 429, in _iterencode yield from _iterencode_list(o, _current_indent_level) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/encoder.py", line 325, in _iterencode_list yield from chunks File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/encoder.py", line 376, in _iterencode_dict raise TypeError(f'keys must be str, int, float, bool or None, ' TypeError: keys must be str, int, float, bool or None, not tuple
I believe this comes from using the predictions_to_json
function, which converts a MultiIndex DataFrame into a dictionary like this:
[{('top_index_1', 'a'): 10.0, ('top_index_1', 'b'): 5.0, ('top_index_2', 'c'): 15.0, ('top_index_3', 'd'): 20.0}]
The keys here are tuples, which results in the error above.
@arjundc-db Would you be able to take a look here? cc also @WeichenXu123
I suggest before calling model.predict, we drop dataframe index first. @dbczumar what do you think ? Some sklearn model (e.g. LinearRegresion) predict routine will also drop the index and return an array as result.
To simplify the issue, I propose to always convert the predict result as an array and then return it as a json list.
To simplify the issue, I propose to always convert the predict result as an array and then return it as a json list.
We need to make sure that the proposed solution is backwards-compatible. @arjundc-db @sueann @yunpark93 Can one of you take a look at this and propose a solution?
@arjundc-db @sueann @yunpark93 Pinging here. Can you take a look?