autokeras
autokeras copied to clipboard
The category mapper in the model casting back to float?
I'm converting autoKeras model to onnx, getting a issue in the onnx model, the prediction result of python model and onnx model are different, onnx team suggesting
The category mapper in the model looks like:
the result is immediately cast back to float, it uses AsString and a lookup table on float32 values, which may be converted to strings with different levels of precision in onnx.
code I'm doing for converting to onnx
X_train, X_valid, y_train, y_valid = train_test_split(input_x, input_y, test_size=0.20, stratify=input_y, random_state=seed)
akmodel = StructuredDataClassifier(max_trials=10)
akmodel.fit(x=X_train, y=y_train, validation_data=(X_valid, y_valid), epochs=100)
autoKeras_model = akmodel.export_model()
pip install tf2onnx==1.9.2
import tf2onnx.convert
onnx_model, _ = tf2onnx.convert.from_keras(model)
Any help is much appreciated, Thanks
I gone through their code in https://github.com/keras-team/autokeras/blob/master/autokeras/keras_layers.py in line 55 for MultiCategoryEncoding, mentioned Encode the categorical features to numerical features.
image
I tried manually encoding before building model, it worked, this was not happening before I guess, so code entering None type
image
added codes to get column_types and column_names in StructuredDataClassifier got model with consistent onnx/tf results
Can I confirm whether converting the data all to numerical is a fair way, before this step data undergoes preprocessing and normalized, so both category and numeric fields are in float type and in this step creating a dictionary showing all fields as "numerical" and passing to StructuredDataClassifier
data_dtypes = data.dtypes.apply(lambda x: x.name).to_dict()
for key in data_dtypes.keys():
data_dtypes[key] = "numerical"
Thanks
Yes, I think it is fair. Is there any use cases that you think should not be converted to numerical in the beginning?
Thanks for the Clarification. Actually I thought based on dataset data types, categorical fields to be mentioned as categorical and only numerical fields to be mentioned as numerical, I didn't except it is the value based numerical mentioning.