onnxmltools icon indicating copy to clipboard operation
onnxmltools copied to clipboard

buildInitialTypesSimple throws TypeError: Cannot map this type to Onnx types: VectorUDT

Open z7ye opened this issue 3 years ago • 2 comments

Hi, I am trying to serialize the pyspark model RandomForestClassifier to onnx. First step, I tried to use buildInitialTypesSimple to get the intial types, but got this error. Any suggestions on how to fix it pls? image The code snippet I used is

from pyspark.ml.feature import VectorAssembler
from pyspark.sql import SparkSession
from sklearn.datasets import load_iris
from pyspark.ml.classification import RandomForestClassifier


spark = SparkSession.builder.getOrCreate()

df = load_iris(as_frame=True).frame.rename(columns={"target": "label"})
df = spark.createDataFrame(df)
df = VectorAssembler(inputCols=df.columns[:-1], outputCol="features").transform(df)
train, test = df.randomSplit([0.8, 0.2])

lor = RandomForestClassifier()

lorModel = lor.fit(train)

pred = lorModel.transform(test)


import onnxmltools
from onnxconverter_common.data_types import FloatTensorType
from onnxmltools.convert import convert_sparkml
from onnxmltools.convert.sparkml.utils import buildInitialTypesSimple

initial_types = buildInitialTypesSimple(train)
onx = convert_sparkml(lorModel, 'sparkml logistic regression', initial_types, spark_session=spark)

z7ye avatar Nov 21 '21 20:11 z7ye

In case someone is facing this same issue, here there is an example in the following file about how to handle the case of VectorUDT() by using FloatTensorType as input types.

santiagxf avatar Jun 22 '22 15:06 santiagxf

buildInitialTypesSimple doesn't support VectorUDT() because you cannot infer the shape just from VectorUDT() type.

memoryz avatar Jun 24 '22 05:06 memoryz