python-sasctl
python-sasctl copied to clipboard
pzmm.MLFlowModel.read_mlflow_model_file() failed with JSONDecodeError: Extra data
Describe the issue Trying to read mlflow model using pzmm.MLFlowModel.read_mlflow_model_file result in JSONDecodeError. I'm just using a simple example from here: https://medium.com/@rehabreda/registering-mlflow-models-to-sas-model-manager-using-sasctl-a-comprehensive-guide-a47dbf183338
To Reproduce The rest of the training code can be found on the above link. The code that perform the read mlflow model file is shown below:
## define randomforest model
model = RandomForestClassifier(n_estimators=300).fit(x_train, y_train)
##Model signature defines schema of model input and output
signature = infer_signature(x_train, model.predict(x_train))
## log model score to mlflow
score = model.score(x_test, y_test)
print("Score: %s" % score)
mlflow.log_metric("score", score)
### log model
mlflow.sklearn.log_model(model, "model", signature=signature)
print("Model saved in run %s" % mlflow.active_run().info.run_uuid)
mlPath = Path(f'./mlruns/1/{mlflow.active_run().info.run_uuid}/artifacts/model')
## get info aboud model variables ,input and output
varDict, inputsDict, outputsDict = pzmm.MLFlowModel.read_mlflow_model_file(mlPath)
Expected behavior Getting the dictionary successfully from pzmm.MLFlowModel.read_mlflow_model_file().
Stack Trace If you're experiencing an exception, include the full stack trace and error message.
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
Cell In[4], line 4
1 mlPath = Path(f'./mlruns/1/{mlflow.active_run().info.run_uuid}/artifacts/model')
3 ## get info aboud model variables ,input and output
----> 4 varDict, inputsDict, outputsDict = pzmm.MLFlowModel.read_mlflow_model_file(mlPath)
File ~\AppData\Local\miniconda3\envs\ml\Lib\site-packages\sasctl\pzmm\mlflow_model.py:56, in MLFlowModel.read_mlflow_model_file(cls, m_path)
53 outputs = m_lines[ind_out[0] : -1]
55 inputs_dict = json.loads("".join([s.strip() for s in inputs])[9:-1])
---> 56 outputs_dict = json.loads("".join([s.strip() for s in outputs])[10:-1])
57 else:
58 raise ValueError(
59 "Improper or unset signature values for model. No input or output "
60 "dicts could be generated. "
61 )
File ~\AppData\Local\miniconda3\envs\ml\Lib\json\__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
341 s = s.decode(detect_encoding(s), 'surrogatepass')
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
348 cls = JSONDecoder
File ~\AppData\Local\miniconda3\envs\ml\Lib\json\decoder.py:340, in JSONDecoder.decode(self, s, _w)
338 end = _w(s, end).end()
339 if end != len(s):
--> 340 raise JSONDecodeError("Extra data", s, end)
341 return obj
JSONDecodeError: Extra data: line 1 column 73 (char 72)
Version 1.10.0
By the way, I'm using mlflow 2.7.1 on Windows 11 machine.
I think I found the root cause.
The MLmodel file has an extra line params in the end like below. Since the code is parsing outputs until the end of line, this params is giving theJSONDecodeError: Extra data error. If I remove the params from the MLmodel. I could read the file just fine.
outputs: '[{"type": "tensor", "tensor-spec": {"dtype": "float64", "shape": [-1]}}]'
params: null
This seems to be a new specification from MLflow 2.6.0 when they add the "Inference params support". This would affect all MLmodel created since MLflow 2.6.0 release. https://github.com/mlflow/mlflow/pull/9068
I believe this is the problematic line of code in sasctl, it assumes no other field after outputs and reads the whole line.
https://github.com/sassoftware/python-sasctl/blob/d2d568248837092c34ce975b88309b7fbbcbde18/src/sasctl/pzmm/mlflow_model.py#L53
Perhaps a better solution is to parse the MLmodel file natively in YAML? Since it is apparently in YAML format. That way you can keep forward compatibility if MLflow decides to add another field. https://mlflow.org/docs/latest/models.html#id28
I'll stick with MLflow 2.5.0 for now, it seems to be working fine.