onnxmltools
onnxmltools copied to clipboard
buildInitialTypesSimple throws TypeError: Cannot map this type to Onnx types: VectorUDT
Hi, I am trying to serialize the pyspark model RandomForestClassifier to onnx. First step, I tried to use buildInitialTypesSimple to get the intial types, but got this error. Any suggestions on how to fix it pls?
The code snippet I used is
from pyspark.ml.feature import VectorAssembler
from pyspark.sql import SparkSession
from sklearn.datasets import load_iris
from pyspark.ml.classification import RandomForestClassifier
spark = SparkSession.builder.getOrCreate()
df = load_iris(as_frame=True).frame.rename(columns={"target": "label"})
df = spark.createDataFrame(df)
df = VectorAssembler(inputCols=df.columns[:-1], outputCol="features").transform(df)
train, test = df.randomSplit([0.8, 0.2])
lor = RandomForestClassifier()
lorModel = lor.fit(train)
pred = lorModel.transform(test)
import onnxmltools
from onnxconverter_common.data_types import FloatTensorType
from onnxmltools.convert import convert_sparkml
from onnxmltools.convert.sparkml.utils import buildInitialTypesSimple
initial_types = buildInitialTypesSimple(train)
onx = convert_sparkml(lorModel, 'sparkml logistic regression', initial_types, spark_session=spark)
In case someone is facing this same issue, here there is an example in the following file about how to handle the case of VectorUDT() by using FloatTensorType
as input types.
buildInitialTypesSimple
doesn't support VectorUDT() because you cannot infer the shape just from VectorUDT() type.