sagemaker-sparkml-serving-container
sagemaker-sparkml-serving-container copied to clipboard
SparkMLModel SAGEMAKER_SPARK_ML_SCHEMA can only accept 16 features
Hello, I would like to understand why this limitation is in place. Presumably most machine learning models take in much more than 16 features.
I created a model and had over 100 features. I tried to pass in all those features to my SAGEMAKER_SPARK_ML_SCHEMA but got the following error:
An error occurred (ValidationException) when calling the CreateModel operation: 1 validation error detected: Value '{SAGEMAKER_SPARKML_SCHEMA={"input": [list_of_column_names_and_types_omitted_due_to_privacy], "output": {"type": "double", "name": "prediction"}}}' at 'primaryContainer.environment' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0, Member must satisfy regular expression pattern: [\S\s]*]
Traceback (most recent call last):
File "<stdin>", line 46, in deploy_model
File "/usr/local/lib/python2.7/site-packages/sagemaker/model.py", line 479, in deploy
self._create_sagemaker_model(instance_type, accelerator_type, tags)
File "/usr/local/lib/python2.7/site-packages/sagemaker/model.py", line 195, in _create_sagemaker_model
tags=tags,
File "/usr/local/lib/python2.7/site-packages/sagemaker/session.py", line 2125, in create_model
self.sagemaker_client.create_model(**create_model_request)
File "/usr/local/lib/python2.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python2.7/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
ClientError: An error occurred (ValidationException) when calling the CreateModel operation: 1 validation error detected: Value '{SAGEMAKER_SPARKML_SCHEMA={"input": [list_of_column_names_and_types_omitted_due_to_privacy], "output": {"type": "double", "name": "prediction"}}}' at 'primaryContainer.environment' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0, Member must satisfy regular expression pattern: [\S\s]*]
list_of_column_names_omitted_due_to_privacy
is the correctly formatted input, the names are not > 1024 characters, all of them are less than 50 chacters.
This led me to some googling and I found the following at: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html under create_model
Environment (dict) -- The environment variables to set in the Docker container. Each key and value in the Environment string to string map can have length of up to 1024. We support up to 16 entries in the map.
So I reduced the number to features to 15 and it works. How can I make this work for 100+ features? My pipeline includes a bunch of StringIndexers -> OneHotEncoderEstimators.
I tried to increase it to 17, that worked. I tried 53 next, that didn't work. 117 was what I first tried and that also doesn't work.
For you, right now, I feel the best bet would be to build a Docker image using the code from this repository and then define the schema as environment variable in your Dockerfile itself. The limitation you are facing is of SageMaker platform, not this library per se.
Sweet thanks for the response. Is there a github for that or should I reach out to AWS directly?
That's part of the standard AWS SDK for SageMaker. You probably need to reach out to AWS for that to pass the request on to the appropriate service team.
Same issue here