lambdr
lambdr copied to clipboard
Another, more explicit example
I had to use my dockerfile FROM, it relies on another OS version. Trying to adapt it, I found that the example in the documentation has some unclear implicit behavior, like CMD ['parity']. How does it work? Why? I couldn't reproduce the original example and made another version. You can use your own R script as an entrypoint now.
Dockerfile:
FROM rocker/tidyverse:4.2.1
RUN apt-get update \
..
COPY . /var/task/
WORKDIR /var/task
RUN Rscript -e "install.packages(c('httr', 'jsonlite', 'logger', 'remotes', 'lambdr'))"
ENTRYPOINT Rscript /var/task/R/startup/aws-lambda.R
R/startup/aws-lambda.R:
library(lambdr)
# for testing the lambda function, pass {'any_val': ..}, as in the next line
start_lambda(config = lambda_config(handler = function(any_val) {
tryCatch({
print(paste('got any_val:', any_val))
list(result = 'Success')
}, error = function(e) {
list(result = paste('Error:', conditionMessage(e), '\n\n', capture.output(traceback())))
})
}))
Hi @grayskripko. Thanks for pointing this out. I'll give it a go. I'll admit that I've only been using the Amazon Linux parent images but you're right, there's no reason lambdr
shouldn't work with an arbitrary image.
The implicit behavior you're referring to is what sets the "_HANDLER" environment variable. With the Amazon Linux parent images this is either the CMD
in the Dockerfile or the setting in the Lambda web interface. The "_HANDLER" environment variable is how lambdr
determines which R function should be used to handle the events.
As far as I can tell, the only thing that would be missing without using one of the AWS parent images would be the Lambda runtime interface emulator, which is used for testing Lambdas locally.
Hi @grayskripko and @mdneuzerling
I can confirm that the example given in the first post does work when deployed.
I did a bit of testing and have a small contribution to nudge along the discussion about arbitrary images.
For testing, I see it's possible to install the AWS Lambda RIE locally and either build it into the base image or test without adding it to the image. That said, I haven't given it a go because both options involve adding extra "stuff" to the process.
Instead I have a simple tweak to the script that could be useful. So long as you have all the same credentials locally as those required by the deployed Lambda (e.g. for accessing Secrets Manager, or an external API, or read/write a database), you can test the R code interactively by structuring it like this:
library(lambdr)
# for testing the lambda function, pass {'any_val': ..}, as in the next line
handler <- function(any_val) {
tryCatch(
{
print(paste("got any_val:", any_val))
list(result = "Success")
},
error = function(e) {
list(result = paste("Error:", conditionMessage(e), "\n\n", capture.output(traceback())))
}
)
}
if (!interactive()) {
start_lambda(config = lambda_config(handler = handler))
}
By pre-defining the handler function and wrapping start_lambda()
in if (!interactive())
, you can now e.g. highlight the whole script, run it, then in the R REPL do handler(123)
.
This equally works if your setup is more spread out like the below:
R/functions.R
divide_this <- function(x) {
x / 2
}
square_this <- function(x) {
x * x
}
R/startup/aws-lambda.R
library(lambdr)
source(file.path("R", "functions.R"))
handler <- function(any_val) {
d <- divide_this(any_val)
s <- square_this(d)
return(s)
}
if (!interactive()) {
start_lambda(config = lambda_config(handler = handler))
}
As a next step I'll see if I can get the RIE working with one or both of the methods I linked.
I also haven't tested the need for the ENTRYPOINT
in the Docker image, or understand why the files get copied specifically into /var/task/
. If either of you could explain that would be great. No worries though, I'll find out what is needed or not by trial and error.
Here's a full Dockerfile btw. I think these Rocker images are great to have because they're just straight up Ubuntu, rather than one of Amazon's bespoke Lambda distros. So not only can we confidently install binaries from PPPM we can even use {pak}
to autodetect and install any R package system dependencies. Otherwise using the AL2023 images takes absolutely AGES.
FROM docker.io/rocker/r-ver:4.4@sha256:429c1a585ab3cd6b120fe870fc9ce4dc83f21793bf04d7fa2657346fffde28d1
# options(warn=2) will make the build error out if package doesn't install
RUN R -e "options(warn = 2); install.packages('pak')"
# Using {pak} to install R packages: it resolves Ubuntu system dependencies AND
# the R dependency tree!
RUN R -e "options( \
warn = 2, \
repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-06') \
); \
pak::pak( \
c( \
'httr', \
'jsonlite', \
'logger', \
'remotes', \
'lambdr' \
) \
)"
# Lambda setup
COPY . /var/task/
WORKDIR /var/task
ENTRYPOINT Rscript /var/task/R/startup/aws-lambda.R
Ok, a little more digging and I've got the following setup which feels slightly cleaner.
Functions are separated from the handler and sourced: This is the recommended project setup by AWS, because functions used by the handler can be unit tested if you want.
The Dockerfile sets both an entrypoint and cmd, which is how they recommend doing it for Python when using an 'alternate' OS image. Specifically they have this at the end of their example Dockerfile
# Set runtime interface client as default command for the container runtime
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
# Pass the name of the function handler as an argument to the runtime
CMD [ "lambda_function.handler" ]
They're pointing to the awslambdaric
module, which is the equivalent of us pointing to an R script that contains lambdr::start_lambda()
.
The combination of using the entrypoint and cmd means that we don't need to pass any config to lambdr::start_lambda()
. I think that's more in keeping with the intentions of the package design (though tell me if I'm wrong @mdneuzerling).
Example project
I'm just sort of spitballing at this point, but probs worth collecting everything together from this thread so far. We're still missing local testing but I'll get to that. If you have easy access to setting up Lambas I'd love to know if the below works for either of you.
Project structure
.
├── Dockerfile
└── R
├── functions.R
└── runtime.R
Dockerfile
FROM docker.io/rocker/r-ver:4.4@sha256:429c1a585ab3cd6b120fe870fc9ce4dc83f21793bf04d7fa2657346fffde28d1
# options(warn=2) will make the build error out if package doesn't install
RUN R -e "options(warn = 2); install.packages('pak')"
# Using {pak} to install R packages: it resolves Ubuntu system dependencies AND
# the R dependency tree!
RUN R -e "options( \
warn = 2, \
repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-06') \
); \
pak::pak( \
c( \
'lambdr' \
) \
)"
# Lambda setup
RUN mkdir -p /lambda/R
COPY R/ /lambda/R
RUN chmod 755 -R /lambda
WORKDIR /lambda
ENTRYPOINT Rscript R/runtime.R
CMD ["handler"]
R/functions.R
divide_this <- function(x) {
x / 2
}
square_this <- function(x) {
x * x
}
R/startup/aws-lambda.R
library(lambdr)
source(file.path("R", "functions.R"))
handler <- function(any_val) {
d <- divide_this(any_val)
s <- square_this(d)
return(s)
}
if (!interactive()) {
start_lambda()
}
Once deployed with the name lambdr-with-arbitrary-image
, invoking via the AWS CLI:
aws lambda invoke --function-name lambdr-with-arbitrary-image \
--invocation-type RequestResponse --payload '{"any_val": 123}' \
/tmp/response.json --cli-binary-format raw-in-base64-out
Gives
{
"StatusCode": 200,
"ExecutedVersion": "$LATEST"
}
Which can be read with
cat /tmp/response.json
3782.25
Love all of this. Thank you! Love the idea of starting with Rocker, especially since that's familiar to a lot of R devs already. Agree on starting with r-ver
--- better to use the minimal image to make the startup time quicker. And the project structure makes a lot of sense.
One minor thing is to call the handler something more specific than "handler".
I think what's needed to put this all together might be a vignette that we can add to the package. What do you think?
I wonder if Rocker wouldn't also help with providing support for ARM/AWS Graviton? See: https://github.com/mdneuzerling/lambdr/issues/31#issuecomment-2129569289
Hey @mdneuzerling! Glad you like it. I've since worked on this to make a more "realistic" example project that has everything (I think) that we've been wondering about, and I was hoping could form the basis of one or more vignettes. For the time being I've put it up on my own account here. Haven't got around to filling in the readmes yet.
The gist of it is:
- It is a Lambda function called
flags
that queries the REST Countries API for a given country name and returns information about the flag - Has a production
Dockerfile
and a developmentDockerfile.dev
- The dev one is based on the equivalent
r-ver
devcontainer image, so it has a bunch of useful R packages already installed, plus Python and theradian
R terminal
- The dev one is based on the equivalent
- Has local testing
Having a separate dev and prod Dockerfile opened up some possibilities:
- The dev one can be optionally used for a VS Code devcontainer
- There's a
devcontainer.json
included so that this is trivial to run
- There's a
- Or there's a "build script" for the dev Dockerfile if the user prefers
- Builds the image and runs the container
- In both cases the development container will mount the volumes containing the R scripts, so that there's no need to keep rebuilding the container all the time to reflect changes made to code. Also mounts AWS creds (though this might need tweaking for Windows peeps) so that your profile can be used to access AWS resources from within the container
- The dev Dockerfile installs the Lambda Runtime Interface Emulator, which enables us to stand up a HTTP endpoint to locally invoke the function for testing
- There's a couple of scripts for this in the
local-testing
folder
- There's a couple of scripts for this in the
I also have a couple of TypeScript files in .infrastructure/
. These are real examples from a minimal CDK stack that I made to deploy the flags
function. I thought having these available alongside the lambda project itself could be useful for people that are working with infrastructure as code (like me!) and have never seen just how minimally you can start out.
Obviously it brings a lot of implications of its own - TypeScript, CDK, etc. - but the AWS CDK 'getting started' docs are actually pretty good so I think we could just link to them for most context. The fact is, providing even one detailed deployment example either with IaC or ClickOps or whatever quickly becomes a large undertaking due to the amount of context and steps required. You've experienced this already with your old blog posts right? So maybe we could simply provide the key files, give high-level context, and link to relevant material.
One minor thing is to call the handler something more specific than "handler".
I learned at work that there's at least one camp of people who think we should always call it handler. It serves a specific purpose and the name reflects that. Lambda functions themselves get named, there's only ever one handler per Lambda function, and Lambda functions are all separate from one another. My Python-based colleagues say this is more in keeping with Python and C and stuff where handler is almost like the equivalent to "main".
It makes sense to me, and I think makes the purpose of the handler a bit clearer for Lambda newbies - I do remember being confused about the whole parity
thing when I first started out. That said, I'm rather ignorant on the matter and it isn't my package 😆 So we can definitely use a more descriptive name if you prefer.
I wonder if Rocker wouldn't also help with providing support for ARM/AWS Graviton?
Almost certainly. I can't remember which specific architecture the containers in my example repo contain at the moment TBH. It's something we could revisit once the new vignettes/docs are a more together?
In the meantime are you happy for me to start plugging away at some documentation material?
I learned at work that there's at least one camp of people who think we should always call it handler. It serves a specific purpose and the name reflects that.
That makes sense. Let's keep it handler
to make it obvious. The only reason I'd suggest otherwise is because it lets people know that you can have multiple functions in one container, but I don't think we need to aim the docs and examples at advanced use cases.
In the meantime are you happy for me to start plugging away at some documentation material?
I would be so grateful for any and all documentation and examples! When I started this package I knew that it would be 10% code, 40% tests, and 50% documentation. Do you think vignettes are the way to go? I'm flexible.
I was thinking that maybe it might be a mix. Here are the parts of my original proposal from https://github.com/mdneuzerling/lambdr/issues/34 that I think are still relevant.
I think for people like me, who are “not-beginners” at R, but are totally new to Data Engineering, a guide/articles with the following would help make lambdr more accessible [ . . . ] The projects would be reproducible examples explained for newbies. They would actually be a single project, just recycled and presented slightly differently in each article due to the requirements of using a parent/arbitrary Docker image.
I think the proposed general audience is probs still right - intermediate R users who need a wee bit more help with the cloud/engineering context. For the others points...
The basics of how Lambda works and why we need lambdr for R projects (contextualises the need for the package)
I think this one could just be a couple or few sentences added to the existing readme/site homepage. And probably just the 'why', not the 'how'.
A primer on Docker (could just be a few links to existing material)
This one could be an article/vignette that uses one of the Dockerfiles as an illustrated example.
A rough overview of how to turn the code and Dockerfile into a Lambda on AWS You’ve got one method in the docs already, but I would like to provide the outline for a CDK option too
This one could be added to the existing Placing an R Lambda Runtime in a Container article. I think the re-working there is to
Start with the outline of an absolutely minimal project ready for deployment (maybe my flags
one), then build up extra concepts
- Arbitrary/non-provided base images vs provided base images
- Dev vs deployment setup
- Local testing
- Deployment via ClickOps or CDK
This way the article becomes a more fleshed out 'whole game' reference.
Regarding AWS provided base images... To be honest, I find the al2023
image to be horrendous. I do have an example of how to set it up for deployment, but the build time was unpleasant, about 10 to 20 minutes depending on the user, and it would freeze 8GB RAM M1 MacBooks when used as a devcontainer. Also, I hate the fact the distro is bespoke rather than a mainline one like Ubuntu. How do we confidently install binaries? It takes {pak}
off the table too because it can't identify system dependencies for anything other than the common Linux distros.
Now that I understand provided images like al2023
are completely optional, I kind of want to direct people away from using them, or at least encourage alternatives like those provided by Rocker. On that subject, while al2
was a better base image it is facing end of life next summer 2025-06-30. To me, it feels a bit close to continue recommending it. I don't know -- what are your thoughts on this?
You've convinced me! I'm happy to swap from the Amazon Linux examples to Rocker examples. Especially if we can install the stuff required for testing Lambda functions locally.
Two things that we should encourage:
- using a minimal Rocker image, like
r-ver
, and - locking down the specific version of the Docker image when something is ready for production.