datarobot-user-models
datarobot-user-models copied to clipboard
Predictions with ONNX don't support non-numeric inputs
The ONNX prediction function attempts to cast all input columns as np.float32 making it not compatible for string and categorical features.
Within ONNXPredictor.predict, we can see the conversion (I've added the comment):
def predict(self, data, model, **kwargs):
super(ONNXPredictor, self).predict(data, model, **kwargs)
input_names = [i.name for i in model.get_inputs()]
session_result = model.run(None, {input_names[0]: data.to_numpy(np.float32)}) # CONVERSION TO FLOAT FAILS FOR STRINGs
if len(session_result) == 0:
raise DrumCommonException("ONNX model should return at least 1 output.")
if len(session_result) == 1:
preds = session_result[0]
else:
preds = self._handle_multiple_outputs(model, session_result)
return preds, None
Lots of Details
Consider the example Titanic Survivors which has mixed features and uses a ColumnTransformer to apply various SkLearn transformation in a pipeline.
As noted in the example, ONNX can support a list of dictionaries as an input instead of a DataFrame:
inputs = {c: X_test2[c].values for c in X_test2.columns}
sess = rt.InferenceSession("pipeline_titanic.onnx")
pred_onx = sess.run(None, inputs)
DRUMs conversion on inbound DataFrame would fail in this case which feels like it would be very common.