autokeras icon indicating copy to clipboard operation
autokeras copied to clipboard

The category mapper in the model casting back to float?

Open hanzigs opened this issue 3 years ago • 3 comments

I'm converting autoKeras model to onnx, getting a issue in the onnx model, the prediction result of python model and onnx model are different, onnx team suggesting

The category mapper in the model looks like:

image

the result is immediately cast back to float, it uses AsString and a lookup table on float32 values, which may be converted to strings with different levels of precision in onnx.

code I'm doing for converting to onnx

X_train, X_valid, y_train, y_valid = train_test_split(input_x, input_y, test_size=0.20, stratify=input_y, random_state=seed)

akmodel = StructuredDataClassifier(max_trials=10)
akmodel.fit(x=X_train, y=y_train, validation_data=(X_valid, y_valid), epochs=100)
autoKeras_model = akmodel.export_model()

pip install tf2onnx==1.9.2
import tf2onnx.convert
onnx_model, _ = tf2onnx.convert.from_keras(model)

Any help is much appreciated, Thanks

hanzigs avatar Aug 11 '21 05:08 hanzigs

I gone through their code in https://github.com/keras-team/autokeras/blob/master/autokeras/keras_layers.py in line 55 for MultiCategoryEncoding, mentioned Encode the categorical features to numerical features.

image

I tried manually encoding before building model, it worked, this was not happening before I guess, so code entering None type

image

added codes to get column_types and column_names in StructuredDataClassifier got model with consistent onnx/tf results

Can I confirm whether converting the data all to numerical is a fair way, before this step data undergoes preprocessing and normalized, so both category and numeric fields are in float type and in this step creating a dictionary showing all fields as "numerical" and passing to StructuredDataClassifier

data_dtypes = data.dtypes.apply(lambda x: x.name).to_dict()
for key in data_dtypes.keys():
    data_dtypes[key] = "numerical"

Thanks

hanzigs avatar Sep 01 '21 02:09 hanzigs

Yes, I think it is fair. Is there any use cases that you think should not be converted to numerical in the beginning?

haifeng-jin avatar Sep 27 '21 22:09 haifeng-jin

Thanks for the Clarification. Actually I thought based on dataset data types, categorical fields to be mentioned as categorical and only numerical fields to be mentioned as numerical, I didn't except it is the value based numerical mentioning.

hanzigs avatar Sep 28 '21 02:09 hanzigs