aws-lambda-r
aws-lambda-r copied to clipboard
reconsider building packages via docker?
I think the tooling has improved considerably and its fairly straightforward to compile binaries within a docker image without having to spin up an ec2. I'm considering using this for working with some of my colleagues that work exclusively in R and was hoping to make it as easy and painless as possible for them to get an RScript scraper deployed with precompiled binaries
I used this repo to build some python library binaries in the past: https://github.com/AlJohri/aws-lambda-lxml. the whole thing boiled down to:
docker run -v $(pwd)/"$version/py36/":/outputs -it lambci/lambda:build-python3.6 pip install lxml=="4.2.4" -t /outputs/
tar -czvf "4.2.4/py36/lxml-4.2.4.tgz" "4.2.4/py36/lxml"
aws s3 cp 4.2.4/py36/lxml-4.2.4.tgz s3://mybucket/lambda-compiled-binaries/py36/
see build.sh
would you consider a PR switching to docker instead of ec2?
The idea behind this tool was to limit the number of software installed locally so that it can work without a problem on any OS. Indeed, Docker speeds development / deployment and I like it a lot. However, it requires Docker: it conflicts with VMs on Windows, not everybody has it installed / knows how to configure it and use it.
That being said, I see deployment using Docker as a possible option, not as a complete switch. Things to keep in mind for a PR:
- read option regarding EC2 vs Docker from the config file(s)
- reuse Docker images whenever possible (e.g. Python + compiled R packages, same as "custom AMI"), save the info in the config file
- DRY: whenever possible, the same script should run on EC2 and Docker (to keep future maintenance low)
- it's a framework, i.e. keep it flexible
- code comments to explain code intent / why
- status messages / checks to let the user know what is going on
- as much as possible, fail early (e.g., do not copy to S3 if zip is incomplete)
- updated documentation to reflect Docker usage (for this project)
- test on Windows (I can help with this one) and OSX
For your application, please make sure firth that the unpacked zip fits in AWS lambda (some R packages for scrapping + dplyr might be too big).