sagemaker-sparkml-serving-container icon indicating copy to clipboard operation
sagemaker-sparkml-serving-container copied to clipboard

SparkMLModel SAGEMAKER_SPARK_ML_SCHEMA can only accept 16 features

Open rchazelle opened this issue 4 years ago • 4 comments

Hello, I would like to understand why this limitation is in place. Presumably most machine learning models take in much more than 16 features.

I created a model and had over 100 features. I tried to pass in all those features to my SAGEMAKER_SPARK_ML_SCHEMA but got the following error:

An error occurred (ValidationException) when calling the CreateModel operation: 1 validation error detected: Value '{SAGEMAKER_SPARKML_SCHEMA={"input": [list_of_column_names_and_types_omitted_due_to_privacy], "output": {"type": "double", "name": "prediction"}}}' at 'primaryContainer.environment' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0, Member must satisfy regular expression pattern: [\S\s]*]
Traceback (most recent call last):
  File "<stdin>", line 46, in deploy_model
  File "/usr/local/lib/python2.7/site-packages/sagemaker/model.py", line 479, in deploy
    self._create_sagemaker_model(instance_type, accelerator_type, tags)
  File "/usr/local/lib/python2.7/site-packages/sagemaker/model.py", line 195, in _create_sagemaker_model
    tags=tags,
  File "/usr/local/lib/python2.7/site-packages/sagemaker/session.py", line 2125, in create_model
    self.sagemaker_client.create_model(**create_model_request)
  File "/usr/local/lib/python2.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python2.7/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
ClientError: An error occurred (ValidationException) when calling the CreateModel operation: 1 validation error detected: Value '{SAGEMAKER_SPARKML_SCHEMA={"input": [list_of_column_names_and_types_omitted_due_to_privacy], "output": {"type": "double", "name": "prediction"}}}' at 'primaryContainer.environment' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0, Member must satisfy regular expression pattern: [\S\s]*]

list_of_column_names_omitted_due_to_privacy is the correctly formatted input, the names are not > 1024 characters, all of them are less than 50 chacters.

This led me to some googling and I found the following at: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html under create_model

Environment (dict) -- The environment variables to set in the Docker container. Each key and value in the Environment string to string map can have length of up to 1024. We support up to 16 entries in the map.

So I reduced the number to features to 15 and it works. How can I make this work for 100+ features? My pipeline includes a bunch of StringIndexers -> OneHotEncoderEstimators.

I tried to increase it to 17, that worked. I tried 53 next, that didn't work. 117 was what I first tried and that also doesn't work.

rchazelle avatar May 16 '20 19:05 rchazelle