sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Unable to use this library in AWS Lambda due to package size exceeded max limit

Open nemalipuri opened this issue 5 years ago • 30 comments

Please fill out the form below.

System Information

  • AWS Lambda:
  • Python v3.6:
  • Sagemaker Python SDK 1.49.0:

Describe the problem

I'm trying to use Sagemaker Python SDK in Lambda to trigger train and deploy steps. Packaged the dependencies along with function code and when trying to create Lambda function it is throwing error 'Unzipped size must be smaller than 262144000 bytes'

Sorry, though this issue is related to Lambda service limit I want to check is there anyway I can reduce the size of the dependencies?

I have tried removing boto3 and botocare from function zip file since Lambda provides these libraries but it lead to different issue 'expecting python-dateutil<2.8.1,>=2.1'

Minimal repro / logs

AWS Lambda error 'Unzipped size must be smaller than 262144000 bytes'

  • Exact command to reproduce: mkdir python cd python pip install sagemaker --target . chmod 777 python zip python directory Upload Zip file to S3 Error when creating AWS Layer 'Failed to create layer version: Unzipped size must be smaller than 262144000 bytes'

Similarly, instead of Laye when packaged code with dependencies and uploading the zip file into Lambda function I received error 'Unzipped size must be smaller than 262144000 bytes'

Appreciate your help.

nemalipuri avatar Dec 26 '19 22:12 nemalipuri

Hi @nemalipuri !

Unfortunately, running sagemaker-python-sdk in AWS Lambda is not currently supported. This is a pain point that we're aware of and for which we are working on prioritizing a solution.

I would normally recommend pinning python-dateutil to 2.8.0 to resolve the conflict, but I actually experimented locally and found that, even without boto3, the zip (55MB) is still over the 50MB zipped limit for Lambda.

An alternative is to remove numpy and scipy dependencies entirely for specific sagemaker installations, as they account for ~73% of the installation size. In order for me to gauge the solution's viability, can you tell me if you will need numpy/scipy functionality when running sagemaker-python-sdk in AWS Lambda? Similarly, what are your sagemaker-python-sdk AWS Lambda use-cases?

Thanks!

knakad avatar Dec 27 '19 21:12 knakad

@knakad Thanks for looking into this.

Almost a year back I've used Sagemaker Python SDK in Lambda without any issues, the version it was 1.18.0 and size of the package was smaller.

Another use-case came up now and when I trie to pull latest package the size is larger than unzipped limit(260MB). Use-case is build a ML model with custom container and implement Lambda functions for creating training job and endpoint creation. StepFunctions will invoke these Lambda services at scheduled times to automate workflow.

I am not using scipy in my client code. Even in sagemaker-python-sdk library I see scipy used at one place only(src/sagemaker/amazon/common.py).

I did try without boto3, botocare and scipy, but Lambda failed with error 'No module named 'numpy.core._multiarray_umath'. Steps I executed: Cloned sagemaker-python-sdk repo v1.49.0 Removed "scipy>=0.19.0" in setup.py pip install into a directory (ex. pip install . -t ./python -c ../requirements.txt) Zipped and uploaded into S3 Created a Layer and attached this layer to Lambda 'import sagemaker' failed with No module named 'numpy.core._multiarray_umath'.

If you could provide some workaround it would be great otherwise plan sdk(via boto3) is the only option I would have to implement sagemaker apis in Lambda.

Thank you.

nemalipuri avatar Dec 30 '19 03:12 nemalipuri

Until sagemaker-python-sdk is officially supported in AWS Lambda, here's a workaround that removes a bit of bloat from the installation, allowing it to fit in lambda without sacrificing any functionality:

pip install sagemaker --target sagemaker-installation cd sagemaker-installation find . -type d -name "tests" -exec rm -rfv {} + find . -type d -name "__pycache__" -exec rm -rfv {} + zip -r sagemaker_lambda_light.zip .

I was able to upload the following zip along with a simple handler that called import sagemaker and some very basic validation.

This solution also doesn't require you to fork any of the code, so you can more easily run the latest sagemaker-python-sdk with the latest features/bug fixes.

Please try it out and let me know if you run into any issues =)

knakad avatar Dec 30 '19 23:12 knakad

Perfect, it worked after executing the above steps. Thank you so much!

nemalipuri avatar Dec 31 '19 19:12 nemalipuri

Anytime! Leaving this issue open to track the workaround and the feature request.

knakad avatar Dec 31 '19 19:12 knakad

@knakad This looks like a great solution and I'd like to implement it. I followed the steps you listed out, created a layer and attached it to my Lambda function, but I still get the error when I try and import sagemaker package in my lambda function:

"errorMessage": "Unable to import module 'lambda_function': No module named 'sagemaker'"

Any idea what could be causing the issue? I don't get any hints in the logs in CloudWatch and it just looks like the function is not able to find the sagemaker package from the attached layer.

Thanks for your help on this, in advance.

ikopas3 avatar Mar 30 '20 05:03 ikopas3

Is there any date decided for the support in AWS Lambda for sagemaker-python-sdk ?

sudeshgit avatar Apr 02 '20 08:04 sudeshgit

Is there any date decided for the support in AWS Lambda for sagemaker-python-sdk ?

After facing the issue myself, I read through the documentation and found the requirement on the path within the zip file that must be followed. There are two options: python or python/lib/python3.8/site-packages. I installed the sagemaker package into a python folder, delete tests and __pycache__ folders, then zipped it up, loaded it to S3 and created a layer. After that, import sagemaker from the lambda function with the layer attached worked for me.

Documentation for ease of reference: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html

ikopas3 avatar Apr 03 '20 01:04 ikopas3

+1 to all of this! Looking forward to using SageMaker in Lambda once this is resolved.

joelachance avatar Apr 03 '20 16:04 joelachance

Until sagemaker-python-sdk is officially supported in AWS Lambda, here's a workaround that removes a bit of bloat from the installation, allowing it to fit in lambda without sacrificing any functionality:

pip install sagemaker --target sagemaker-installation cd sagemaker-installation find . -type d -name "tests" -exec rm -rfv {} + find . -type d -name "__pycache__" -exec rm -rfv {} + zip -r sagemaker_lambda_light.zip .

I was able to upload the following zip along with a simple handler that called import sagemaker and some very basic validation.

This solution also doesn't require you to fork any of the code, so you can more easily run the latest sagemaker-python-sdk with the latest features/bug fixes.

Please try it out and let me know if you run into any issues =)

@knakad do we need to manually zip sagemaker installation along with handler.py and upload it manually to s3? also, how will the lambda function pick up the new zip file? It would be helpful if you could list down the steps to do this. Thanks!

ksachdeva11 avatar Sep 09 '20 13:09 ksachdeva11

In order to create a valid sagemaker SDK layer it is important to create the layer using an AWS compatible numpy version (since some numpy packages are binary). Here is a slightly updated version of the above that has proved to work for me:

mkdir sagemaker-layer
cd sagemaker-layer
mkdir python
# Install the sagemaker modules in the python folder
pip install sagemaker --target ./python
# Remove tests and cache stuff (to reduce size)
find ./python -type d -name "tests" -exec rm -rfv {} +
find ./python -type d -name "__pycache__" -exec rm -rfv {} +

# Remove the python/numpy* folders since it will contain a numpy version for your host machine
rm -rf python/numpy*

# Download an AWS Linux compatible numpy package
# Navigate to https://pypi.org/project/numpy/#files.
# Search for and download newest *manylinux1_x86_64.whl package for your Python version (I have Python 3.7)
curl "https://files.pythonhosted.org/packages/9b/04/c3846024ddc7514cde17087f62f0502abf85c53e8f69f6312c70db6d144e/numpy-1.19.2-cp37-cp37m-manylinux2010_x86_64.whl" -o "numpy-1.19.2-cp36-cp36m-manylinux1_x86_64.whl"
unzip numpy-1.19.2-cp37-cp37m-manylinux2010_x86_64.whl -d python

zip -r sagemaker_lambda.zip .

# When zip file is ready, upload it to S3
aws s3 cp sagemaker_lambda.zip s3://ai4iot-lambda/sagemaker_lambda_light.zip

# When upload is complete, goto Lambda layers to create a layer from the uploaded zip file.

arne-munch-ellingsen avatar Oct 23 '20 11:10 arne-munch-ellingsen

@arne-munch-ellingsen Thank you for the lead. I tried your code but when testing the lambda function I got the following error: Response: { "errorMessage": "Unable to import module 'lambda_function': cannot import name '_ccallback_c' from 'scipy._lib' (/opt/python/scipy/_lib/__init__.py)", "errorType": "Runtime.ImportModuleError" }

My local machine (where I ran your code) is Mac OS, any idea what am I missing?

shlomi-schwartz avatar Nov 03 '20 08:11 shlomi-schwartz

@shlomi-schwartz Are you trying to import scipy in your Lambda function? If that is the case you will have to add scipy to your layer as well using the same "trick" that I used to add the AWS Lambda Python 3.7 specific numpy library. The Sagemaker SDK does not include scipy.

arne-munch-ellingsen avatar Nov 03 '20 11:11 arne-munch-ellingsen

@arne-munch-ellingsen Thanks for the tip, I was not calling scipy, it was one of the dependencies for sagemaker==1.71.1, but I used your trick and downloaded the .whl file, it works now!

Thanks again 👍

shlomi-schwartz avatar Nov 03 '20 12:11 shlomi-schwartz

This worked for me, but I am looking forward to the actual support for SageMaker SDK in Lambda.

mkdir lambda_deployment
cd lambda_deployment
touch lambda_function.py

Write the logic in the lambda_function.py file.

pip install sagemaker --target sagemaker-installation
cd sagemaker-installation
find . -type d -name "tests" -exec rm -rfv {} +
find . -type d -name "__pycache__" -exec rm -rfv {} +
zip -r ../lambda-deployment.zip .
cd ..
zip -g lambda-deployment.zip lambda_function.py

Then upload lambda-deployment.zip to Lambda

calvinfeng avatar Feb 24 '21 02:02 calvinfeng

further to @arne-munch-ellingsen's post, you can skip the download of the numpy whl and use the AWSLambda-Python37-SciPy1x layer provided by AWS (arn:aws:lambda:eu-west-2:142628438157:layer:AWSLambda-Python37-SciPy1x:35) instead

mhobby avatar Feb 27 '21 17:02 mhobby

I'm trying to follow this tutorial about scheduling data wrangler processing jobs. I created my lambda function uploading the zip file that was created following these commands:

pip install sagemaker --target sagemaker-installation
cd sagemaker-installation
find . -type d -name "tests" -exec rm -rfv {} +
find . -type d -name "__pycache__" -exec rm -rfv {} +
zip -r ../lambda-deployment.zip .
cd ..
zip -g lambda-deployment.zip lambda_function.py export.flow

The zip file has around 35MB. Then, when I try to add the Scipy layer to the lambda I got the following error:

"Function code combined with layers exceeds the maximum allowed size of 262144000 bytes. The actual size is 263682135 bytes."

Does anyone know how to deal with this? Could I somehow reduce even more the sagemaker size?

AlvaroCavalcante avatar Oct 22 '21 14:10 AlvaroCavalcante

Tried many approaches and nothing worked for me. It turns out it had to do with my local machine not using a Linux operating system (I have a macOS Catalina). Followed the instructions here for the installation of numpy and it worked like a charm 😄 (credits to Shandy Roque) :

cd <directory-containing-lambda_function.py>

# Install sagemaker
pip install sagemaker --target sagemaker-installation

# Remove numpy since it will contain incompatible binary files (when installed in a OS different from Linux)
rm -rf sagemaker-installation/numpy*

# Download an AWS Linux compatible numpy package
pip install numpy \
    --platform manylinux2014_x86_64 \
    --target=sagemaker-installation \
    --implementation cp \
    --python 3.8 \
    --only-binary=:all: --upgrade 
 
# Remove unnecessary files
cd sagemaker-installation
find . -type d -name "tests" -exec rm -rfv {} +
find . -type d -name "__pycache__" -exec rm -rfv {} +

# Zip everything together 
zip -r ../lambda-deployment.zip .
cd ..
zip -g lambda-deployment.zip lambda_function.py 

camilaagw avatar Jan 10 '23 07:01 camilaagw

@knakad any news on this? Lambda functions are great for orchestrating more complicated flows, which end with SageMaker prediction. Not being able to use SDK is irritating. Heavy dependencies can be moved to package extra, like sagemaker[numpy].

j-adamczyk avatar Aug 17 '23 13:08 j-adamczyk

Any updates on this? new package updates has broken the way to import sagemaker module

KaramRazooq avatar Sep 10 '23 05:09 KaramRazooq

As mentioned by @KaramRazooq, the above instructions needed to be updated. In a nutshell, I had to downgrade jsonschema to 4.17.3 and install linux specific pandas package. I built upon the solution given by @arne-munch-ellingsen. Here is the version that worked for me:

mkdir sagemaker-layer
cd sagemaker-layer
mkdir python

# Install the sagemaker modules in the python folder
pip install sagemaker --target ./python

# Remove the python/numpy* folders since it will contain a numpy version for your host machine
rm -rf python/numpy*

# Remove the python/pandas* folders since it will contain a pandas version for your host machine
rm -rf python/pandas*

# Downgrade jsonschema from 4.19 to 4.17.3 to avoid the rdps.rdps import error.
rm -rf python/jsonschema*
pip install jsonschema==4.17.3 --target ./python

# Download an AWS Linux compatible numpy package.
# Navigate to https://pypi.org/project/numpy/#files.
# Search for and download newest *manylinux_2_x86_64.whl package for your Python version (I have Python 3.9). I had two # .whl packages that matched this requirement. The version mentioned below worked for me.
curl "https://files.pythonhosted.org/packages/69/1f/c95b1108a9972a52d7b1b63ed8ca70466b59b8c1811bd121f1e667cc45d8/numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl" -o "numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
unzip numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -d python

# Download an AWS Linux compatible pandas package.
# Navigate to https://pypi.org/project/pandas/#files.
# Search for and download newest *manylinux_2_x86_64.whl package for your Python version
curl "https://files.pythonhosted.org/packages/83/f0/2765daac3c58165460b127df5c0ef7b3a039f3bfe7ea7a51f3d20b01371b/pandas-2.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl" -o "pandas-2.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
unzip pandas-2.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -d python

# Remove tests and cache stuff (to reduce size)
find ./python -type d -name "tests" -exec rm -rfv {} +
find ./python -type d -name "__pycache__" -exec rm -rfv {} +

zip -r sagemaker_lambda.zip python

PreethiJC avatar Sep 15 '23 07:09 PreethiJC

@PreethiJC's method works, as long as you include the Compatible runtimes and architectures!

KaramRazooq avatar Sep 16 '23 02:09 KaramRazooq

Hey, y'all! Any news on the MR? I feel that this is an important update for sagemaker-python-sdk, especially for the usage on Lambdas.

guimorg avatar Jan 02 '24 03:01 guimorg

The sagemaker library really be something made available through https://aws.amazon.com/serverless/serverlessrepo/. All I want to do is kick off a processor.run job on from my lambda. As is I need to set up a pipeline and then call that.

gorj-tessella avatar Apr 02 '24 20:04 gorj-tessella

Seems a fix has been merged, does the issue still exist?

liujiaorr avatar Apr 21 '24 22:04 liujiaorr

Close this issue now, feel free to open if there is still a problem on it.

liujiaorr avatar Apr 28 '24 04:04 liujiaorr

Hmm. I am executing the following script:

mkdir python
cd python
pip install sagemaker --target .
cd ..
du -sh
# 207M	.

Maybe the issue is resolved, but I feel that we can still have pandas and numpy as optional when one wants to use SageMaker's API (for my case I use it for assessing SageMaker Pipelines, for example, but I still have to bring lots of dependencies and sometimes this still makes me reach the max size of a layer).

Could we still try to optimize the size of this package, the abstractions that are maintained here are really good for handling SageMaker Pipelines.

guimorg avatar May 02 '24 17:05 guimorg

Thanks for your information @guimorg, I updated the priority to the highest level to make it get a quicker attention of resolving.

liujiaorr avatar May 03 '24 16:05 liujiaorr

@guimorg this PR https://github.com/aws/sagemaker-python-sdk/pull/4222 could help make pandas optional

trungleduc avatar May 03 '24 17:05 trungleduc

Breaking out sagemaker.workflow (and maybe others) into their own libraries would address many use cases. Our use case is I just want a Lambda that will orchestrate workflow pipelines But if I include just

from sagemaker.workflow.notebook_job_step import NotebookJobStep
from sagemaker.workflow.pipeline import Pipeline

I still have to install all of sagemaker and all its huge dependencies and end up with a lambda image size that is too big to deploy as a zipfile image.

rberger avatar May 22 '24 01:05 rberger