sagemaker-python-sdk need to be able to set return data size for each request in batch transform

Describe the feature you'd like I'm running a NLP inference job to get sentence embedding vector for each record. Each record is less than 512 words only and the returned vector has 768 floats. Even if I set max_payload to 1 and max_concurrent_transforms to 1, I still got: io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 113038147

I know there is a 5MB limit for endpoint inference. I'm relatively sure that this is caused by the size limit of return results of each request from batch transform, though I'm not able to find it in the doc.

I hope you can

Clarify the size limit for each request for batch transform in the doc.
Mimicking max_payload, expose a parameter to let user control the return data size for each request for batch transform.

How would this feature be used? Please describe.

Add a max_return_payload parameter to model.transformer like below.

tfm = model.transformer(instance_count=1, instance_type='ml.p3.2xlarge', accept='text/json', assemble_with='Line', output_path=batch_out, strategy='MultiRecord', max_payload=3, max_return_payload=30)

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Use strategy='SingleRecord' can bypass the issue. But it is significant slower as it's not making use of parallelism. Setting max_concurrent_transform to a larger value can make use of parallelism. But it could cause problem for code that's not designed for concurrency.

Additional context Add any other context or screenshots about the feature request here.

A similar issue is raised here by other people: https://github.com/aws/sagemaker-python-sdk/issues/1096

Sep 10 '20 22:09 ldong87

Anyone has been able to solve it? having the same problem

Jan 08 '21 16:01 siovaneDAZN

Having the same issue when returning a single 1024x1024 image.

Jan 12 '21 15:01 NicolaiDige

Are you using PyTorch? In my case, it was about netty max response size in AWS pytorch image and you can increase it. AWS pytorch image uses TorchServe and TorchServe uses netty internally. You can see "netty" in your error message. TorchServe can be configured by config.properties, which already exists in AWS pytorch image.

https://github.com/aws/deep-learning-containers/blob/ede068b4363ba22fd785c426a7f3589bada76d4f/pytorch/inference/docker/1.9/py3/cu111/Dockerfile.gpu#L208

In this line, /home/model-server/config.properties is provided as TorchServe config. So all you have to do is add custom configs into the file. You can do this by extending the container with following steps.

Add enable_envvars_config=true into /home/model-server/config.properties
set environment variable TS_MAX_RESPONSE_SIZE to large value.

If you set enable_envvars_config=true , you can set all properties with environment variable TS_<PROPERTY_NAME>. Environment variables can be set using SageMaker SDK, so it will be better to use env than set properties directly into the config file. max_response_size is The maximum allowable response size that the Torchserve sends, in bytes according to PyTorch Serve documentation.

Example Dockerfile

FROM <AWS_PYTORCH_IMAGE_URI>

RUN echo "enable_envvars_config=true" >> /home/model-server/config.properties
# Set TS_MAX_RESPONSE=655350000 or large value you need into env vars using SageMaker SDK

Reference

https://pytorch.org/serve/configuration.html

Nov 22 '21 02:11 grraffe

I'm having the same problem. I don't understand how can I set enable_envvars_config=true from the notebook (if there is a way). Do you have some ideas? Thanks

Nov 24 '21 17:11 stevinc

@stevinc I think it is good to build a new image. Write a Dockerfile same with the example I wrote in past comment, build it, and upload to ECR. You can use the image with SageMaker SDK using image_uri parameter in sagemaker.estimator.Estimator or sagemaker.estimator.Framework.

Nov 25 '21 05:11 grraffe

@grraffe Thanks. I build a new image with a custom Dockerfile where I load my modified config.properties and now it works.

Nov 26 '21 09:11 stevinc

It seems like enable_envvar_config is already preset to true by AWS. I could pass the TS_MAX_RESPONSE_SIZE as an environment variable, as shown in the code below, and managed to resolve the above error. transformer = model.transformer(instance_count=1, instance_type="ml.p3.2xlarge", assemble_with='Line', accept='application/jsonlines', strategy='MultiRecord', max_payload=6, env={'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600', 'TS_MAX_RESPONSE_SIZE':'20000000'} )

Mar 16 '22 14:03 jinyantan

Depending on the serving tool you use, i.e. TorchServe (TS) / Multi-model Server, you can change the maximum request/response size by setting the TS_MAX_RESPONSE_SIZE / MMS_MAX_RESPONSE_SIZE / ... environment variables when creating / deploying your model in SageMaker. See further details in my response here for MMS.

Apr 01 '22 08:04 vprecup

Had the same issue for realtime endpoints, passing in TS_MAX_RESPONSE_SIZE in the env variable solved this as well.

Apr 13 '22 03:04 laphang

@laphang @vprecup I am having this issue as well, and I am wondering how you are passing these environment variables to your PyTorch sagemaker model?

estimator = PyTorch(
    entry_point="train.py",
    source_dir="sagemaker_container_files",
    role=role,
    py_version="py39",
    framework_version="1.13",
    instance_count=1,
    instance_type="ml.c5.2xlarge",
    hyperparameters=hyperparameters,
    env={'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600', 
         'TS_MAX_RESPONSE_SIZE':'2000000000',
         'TS_MAX_REQUEST_SIZE':'2000000000',
         'MMS_MAX_RESPONSE_SIZE':'2000000000',
         'MMS_MAX_REQUEST_SIZE':'2000000000',
    }
)

is this a valid way to pass in these env variables? or do I do it when I deploy the model? with:

estimator.deploy(.....)

or at some other point?

May 01 '23 17:05 levatas-weston

@levatas-weston yes to the PyTorchModel, see the base class docs below. https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model

May 02 '23 03:05 laphang

If anyone else in the future is wondering, the following is what solved my issue:

estimator.deploy(........,env={
    'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600', 
    'TS_MAX_RESPONSE_SIZE':'2000000000',
    'TS_MAX_REQUEST_SIZE':'2000000000',
    'MMS_MAX_RESPONSE_SIZE':'2000000000',
    'MMS_MAX_REQUEST_SIZE':'2000000000',}
})

My understanding is the container that SageMaker creates for training and the container it makes for deployment are completely separate, and therefore you only need these env variables in the deployment container (that has torchserve running).

May 02 '23 13:05 levatas-weston

sagemaker-python-sdk sagemaker-python-sdk copied to clipboard

need to be able to set return data size for each request in batch transform

Example Dockerfile

Reference

sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard