yggdrasil-decision-forests
yggdrasil-decision-forests copied to clipboard
[Question] Ignore Columns in Deployed Model?
I want to ignore certain feature columns in the input for both model training and when I invoke the model deployed to an endpoint. I don't want the client of the model to "know" that the model is ignoring the features.
When training the model, I can ignore user_id
feature as follows:
all_features = ["user_id", "age", "gender", "job_title", "label"]
ignored_features = ["user_id", "label"]
features = [feature for feature in all_features if feature not in ignored_features]
df_learner = ydf.GradientBoostedTreesLearner(label="label", features=features, include_all_columns=False)
df_model = df_learner.train(ds=cached_train_ds, valid=cached_test_ds)
I can then use df_model.predict(cached_test_ds)
to make predictions, and it properly ignores the excluded features.
But if I deploy the model to an endpoint using TensorFlow serving (in Amazon SageMaker), I get an error if I try to invoke it with input that includes the features I want to ignore:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation:
Received server error (500) from primary with message "{"error": "{\n \"error\": \"Failed
to process element: 0 key: user_id of 'instances' list. Error: INVALID_ARGUMENT: JSON object:
does not have named input: user_id\"\n}"}". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-rrrrrr-2#logEventViewer:group=/aws/sagemaker/Endpoints/yyyyyyyyyyyyyyyyyyyy
in account XXXXXXXXXXXXXX or more information.
I expected that the deployed model would silently ignore the excluded features. I could write a custom inference script to drop the excluded features, but it seems YDF should handle it for me, just as it does when I call df_model.predict(cached_test_ds)
. Shouldn't it?