yggdrasil-decision-forests icon indicating copy to clipboard operation
yggdrasil-decision-forests copied to clipboard

[Question] Ignore Columns in Deployed Model?

Open rlcauvin opened this issue 8 months ago • 3 comments

I want to ignore certain feature columns in the input for both model training and when I invoke the model deployed to an endpoint. I don't want the client of the model to "know" that the model is ignoring the features.

When training the model, I can ignore user_id feature as follows:

all_features = ["user_id", "age", "gender", "job_title", "label"]
ignored_features = ["user_id", "label"]
features = [feature for feature in all_features if feature not in ignored_features]

df_learner = ydf.GradientBoostedTreesLearner(label="label", features=features, include_all_columns=False)
df_model = df_learner.train(ds=cached_train_ds, valid=cached_test_ds)

I can then use df_model.predict(cached_test_ds) to make predictions, and it properly ignores the excluded features.

But if I deploy the model to an endpoint using TensorFlow serving (in Amazon SageMaker), I get an error if I try to invoke it with input that includes the features I want to ignore:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: 
Received server error (500) from primary with message "{"error": "{\n    \"error\": \"Failed
to process element: 0 key: user_id of 'instances' list. Error: INVALID_ARGUMENT: JSON object:
does not have named input: user_id\"\n}"}". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-rrrrrr-2#logEventViewer:group=/aws/sagemaker/Endpoints/yyyyyyyyyyyyyyyyyyyy
in account XXXXXXXXXXXXXX or more information.

I expected that the deployed model would silently ignore the excluded features. I could write a custom inference script to drop the excluded features, but it seems YDF should handle it for me, just as it does when I call df_model.predict(cached_test_ds). Shouldn't it?

rlcauvin avatar Jun 13 '24 22:06 rlcauvin