datarobot-user-models icon indicating copy to clipboard operation
datarobot-user-models copied to clipboard

Predictions with ONNX don't support non-numeric inputs

Open kindofluke opened this issue 3 years ago • 0 comments

The ONNX prediction function attempts to cast all input columns as np.float32 making it not compatible for string and categorical features.

Within ONNXPredictor.predict, we can see the conversion (I've added the comment):

    def predict(self, data, model, **kwargs):
        super(ONNXPredictor, self).predict(data, model, **kwargs)

        input_names = [i.name for i in model.get_inputs()]
        session_result = model.run(None, {input_names[0]: data.to_numpy(np.float32)}) # CONVERSION TO FLOAT FAILS FOR STRINGs

        if len(session_result) == 0:
            raise DrumCommonException("ONNX model should return at least 1 output.")

        if len(session_result) == 1:
            preds = session_result[0]
        else:
            preds = self._handle_multiple_outputs(model, session_result)
        return preds, None

Lots of Details

Consider the example Titanic Survivors which has mixed features and uses a ColumnTransformer to apply various SkLearn transformation in a pipeline.

As noted in the example, ONNX can support a list of dictionaries as an input instead of a DataFrame:

inputs = {c: X_test2[c].values for c in X_test2.columns}
sess = rt.InferenceSession("pipeline_titanic.onnx")
pred_onx = sess.run(None, inputs)

DRUMs conversion on inbound DataFrame would fail in this case which feels like it would be very common.

kindofluke avatar Aug 24 '22 00:08 kindofluke