serverless-python-requirements icon indicating copy to clipboard operation
serverless-python-requirements copied to clipboard

Can't noDeploy numpy when pandas in requirements.txt

Open aaronbarzilai opened this issue 6 years ago • 6 comments

Apologies if this is a duplicate, I tried to search for issues. Also, I really appreciate the great work to create this plugin, I used serverless 1-2 years ago before layers and libraries like pandas on AWS was a significant pain.

I am using python and trying to use pandas on AWS Lambda. I have managed to make everything work with a minimal requirements.text.

Bottleneck==1.2.1
certifi==2019.9.11
numexpr==2.7.0
numpy==1.17.2
pandas==0.25.1
python-dateutil==2.8.0
pytz==2019.3
six==1.12.0

However, even if I specify numpy in noDeploy option, it still seems to be appearing in .requirements.zip image

Here's my serverless.yml file section on serverless-python-requirements

custom:
  pythonRequirements:
    fileName: requirements.txt
    dockerizePip: true
    useStaticCache: false
    useDownloadCache: false
    zip: true # Compresses the libraries in additional file and addsunzip_requirements.py in the final bundle.
    slim: true # Removes unneeded files and directories such as *.so, *.pyc, dist-info, etc.
    noDeploy: # Omits certain packages from deployment.
      - boto3
      - botocore
      - docutils
      - jmespath
      - pip
      - python-dateutil
      - s3transfer
      - setuptools
      - six
      - numpy
    layer: true

My goal is to deploy this layer without numpy to save space, and then use the built in AWS SciPy NumPy layer. I am not a requirements.txt expert as I tend to use conda not pip.

Am I doing something wrong?

Thanks so much for all the work on this, Aaron

aaronbarzilai avatar Oct 08 '19 16:10 aaronbarzilai

did you solve it?

3nomis avatar Feb 12 '20 15:02 3nomis

I got the idea. Current noDeploy implementation will only exclude the package from requirements.txt it generates. Because since pandas requires numpy, it will added numpy back thus make noDeploy has no effect.

A fix is not so simple due to we use pip install -t while no corresponding uninstall command applied for that. Maybe we should add a config to explicitly exclude files from resulting Zip?

(have to exclude typing for similar issue)

littlebtc avatar Feb 20 '20 21:02 littlebtc

I found a very dirty workaround on that:

    slim: true # otherwise slimPatterns will not work
    strip: false # avoid some ELF alignment issues
    slimPatternsAppendDefaults: false 
    slimPatterns:
      # Won't work with noDeploy since
      # dependencies will go back
      - numpy/**
      # Exclude **/*.dist-info* may cause trouble
      - '**/*.py[c|o]'
      - '**/__pycache__*'

YMMV.

littlebtc avatar Feb 20 '20 22:02 littlebtc

I'm removing the "bug" classification from this - since it seems like the issue was with transitive dependencies of pandas. Had you tried packaging pandas along with numpy in your layer and excluding it from the requirements.txt, and see if that works out?

miketheman avatar Feb 22 '20 14:02 miketheman

In my case, I'm using a package called sqlalchemy-aurora-data-api which depends on boto3. Unfortunately, this means that my Lambdas will deploy with boto3 even though the library is already included in the Lambda environment. While this may not be a bug, I do think it's a bit unintuitive that noDeploy can't handle transitive dependencies. I'm currently using the workaround @littlebtc provided.

velovix avatar Dec 15 '20 17:12 velovix

@miketheman I respectfully disagree, this is a bug. The attribute noDeploy of pythonRequirements is currently documented in the README as:

You can omit a package from deployment with the noDeploy option. Note that dependencies of omitted packages must explicitly be omitted too.

As the original comment indicates, this is not the case when another package has numpy (or other packages) as its dependency. We have packaged pandas, numpy, and scipy in our layer, but we cannot use any library that depends on them (i.e., statsmodels), because then we'd run into this issue.

While @littlebtc 's solution works, it would be nice if noDeploy prevented packages from being deployed.

luksfarris avatar Nov 30 '21 17:11 luksfarris