lambdr Another, more explicit example

I had to use my dockerfile FROM, it relies on another OS version. Trying to adapt it, I found that the example in the documentation has some unclear implicit behavior, like CMD ['parity']. How does it work? Why? I couldn't reproduce the original example and made another version. You can use your own R script as an entrypoint now.

Dockerfile:

FROM rocker/tidyverse:4.2.1
RUN apt-get update  \
..
COPY . /var/task/
WORKDIR /var/task
RUN Rscript -e "install.packages(c('httr', 'jsonlite', 'logger', 'remotes', 'lambdr'))"
ENTRYPOINT Rscript /var/task/R/startup/aws-lambda.R

R/startup/aws-lambda.R:

library(lambdr)
# for testing the lambda function, pass {'any_val': ..}, as in the next line
start_lambda(config = lambda_config(handler = function(any_val) {
  tryCatch({
    print(paste('got any_val:', any_val))
    list(result = 'Success')
  }, error = function(e) {
    list(result = paste('Error:', conditionMessage(e), '\n\n', capture.output(traceback())))
  })
}))

Mar 08 '24 18:03 grayskripko

Hi @grayskripko. Thanks for pointing this out. I'll give it a go. I'll admit that I've only been using the Amazon Linux parent images but you're right, there's no reason lambdr shouldn't work with an arbitrary image.

The implicit behavior you're referring to is what sets the "_HANDLER" environment variable. With the Amazon Linux parent images this is either the CMD in the Dockerfile or the setting in the Lambda web interface. The "_HANDLER" environment variable is how lambdr determines which R function should be used to handle the events.

As far as I can tell, the only thing that would be missing without using one of the AWS parent images would be the Lambda runtime interface emulator, which is used for testing Lambdas locally.

Mar 10 '24 06:03 mdneuzerling

Hi @grayskripko and @mdneuzerling

I can confirm that the example given in the first post does work when deployed.

I did a bit of testing and have a small contribution to nudge along the discussion about arbitrary images.

For testing, I see it's possible to install the AWS Lambda RIE locally and either build it into the base image or test without adding it to the image. That said, I haven't given it a go because both options involve adding extra "stuff" to the process.

Instead I have a simple tweak to the script that could be useful. So long as you have all the same credentials locally as those required by the deployed Lambda (e.g. for accessing Secrets Manager, or an external API, or read/write a database), you can test the R code interactively by structuring it like this:

library(lambdr)
# for testing the lambda function, pass {'any_val': ..}, as in the next line
handler <- function(any_val) {
  tryCatch(
    {
      print(paste("got any_val:", any_val))
      list(result = "Success")
    },
    error = function(e) {
      list(result = paste("Error:", conditionMessage(e), "\n\n", capture.output(traceback())))
    }
  )
}

if (!interactive()) {
  start_lambda(config = lambda_config(handler = handler))
}

By pre-defining the handler function and wrapping start_lambda() in if (!interactive()), you can now e.g. highlight the whole script, run it, then in the R REPL do handler(123).

This equally works if your setup is more spread out like the below:

R/functions.R

divide_this <- function(x) {
  x / 2
}

square_this <- function(x) {
  x * x
}

R/startup/aws-lambda.R

library(lambdr)
source(file.path("R", "functions.R"))

handler <- function(any_val) {
  d <- divide_this(any_val)
  s <- square_this(d)

  return(s)
}

if (!interactive()) {
  start_lambda(config = lambda_config(handler = handler))
}

As a next step I'll see if I can get the RIE working with one or both of the methods I linked.

I also haven't tested the need for the ENTRYPOINT in the Docker image, or understand why the files get copied specifically into /var/task/. If either of you could explain that would be great. No worries though, I'll find out what is needed or not by trial and error.

Jul 08 '24 12:07 jimgar

Here's a full Dockerfile btw. I think these Rocker images are great to have because they're just straight up Ubuntu, rather than one of Amazon's bespoke Lambda distros. So not only can we confidently install binaries from PPPM we can even use {pak} to autodetect and install any R package system dependencies. Otherwise using the AL2023 images takes absolutely AGES.

FROM docker.io/rocker/r-ver:4.4@sha256:429c1a585ab3cd6b120fe870fc9ce4dc83f21793bf04d7fa2657346fffde28d1

# options(warn=2) will make the build error out if package doesn't install
RUN R -e "options(warn = 2); install.packages('pak')"
# Using {pak} to install R packages: it resolves Ubuntu system dependencies AND
# the R dependency tree!
RUN R -e "options( \ 
    warn = 2, \
    repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-06') \
    ); \ 
    pak::pak( \ 
    c( \ 
    'httr', \
    'jsonlite', \
    'logger', \
    'remotes', \
    'lambdr' \
    ) \
    )"

# Lambda setup
COPY . /var/task/
WORKDIR /var/task

ENTRYPOINT Rscript /var/task/R/startup/aws-lambda.R

Jul 08 '24 13:07 jimgar

Ok, a little more digging and I've got the following setup which feels slightly cleaner.

Functions are separated from the handler and sourced: This is the recommended project setup by AWS, because functions used by the handler can be unit tested if you want.

The Dockerfile sets both an entrypoint and cmd, which is how they recommend doing it for Python when using an 'alternate' OS image. Specifically they have this at the end of their example Dockerfile

# Set runtime interface client as default command for the container runtime
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
# Pass the name of the function handler as an argument to the runtime
CMD [ "lambda_function.handler" ]

They're pointing to the awslambdaric module, which is the equivalent of us pointing to an R script that contains lambdr::start_lambda().

The combination of using the entrypoint and cmd means that we don't need to pass any config to lambdr::start_lambda(). I think that's more in keeping with the intentions of the package design (though tell me if I'm wrong @mdneuzerling).

Example project

I'm just sort of spitballing at this point, but probs worth collecting everything together from this thread so far. We're still missing local testing but I'll get to that. If you have easy access to setting up Lambas I'd love to know if the below works for either of you.

Project structure

.
├── Dockerfile
└── R
    ├── functions.R
    └── runtime.R

Dockerfile

FROM docker.io/rocker/r-ver:4.4@sha256:429c1a585ab3cd6b120fe870fc9ce4dc83f21793bf04d7fa2657346fffde28d1

# options(warn=2) will make the build error out if package doesn't install
RUN R -e "options(warn = 2); install.packages('pak')"
# Using {pak} to install R packages: it resolves Ubuntu system dependencies AND
# the R dependency tree!
RUN R -e "options( \ 
    warn = 2, \
    repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-06') \
    ); \ 
    pak::pak( \ 
    c( \ 
    'lambdr' \
    ) \
    )"

# Lambda setup
RUN mkdir -p /lambda/R
COPY R/ /lambda/R
RUN chmod 755 -R /lambda

WORKDIR /lambda
ENTRYPOINT Rscript R/runtime.R
CMD ["handler"]

R/functions.R

divide_this <- function(x) {
  x / 2
}

square_this <- function(x) {
  x * x
}

R/startup/aws-lambda.R

library(lambdr)
source(file.path("R", "functions.R"))

handler <- function(any_val) {
  d <- divide_this(any_val)
  s <- square_this(d)

  return(s)
}

if (!interactive()) {
  start_lambda()
}

Once deployed with the name lambdr-with-arbitrary-image, invoking via the AWS CLI:

aws lambda invoke --function-name lambdr-with-arbitrary-image \
  --invocation-type RequestResponse --payload '{"any_val": 123}' \
  /tmp/response.json --cli-binary-format raw-in-base64-out

Gives

{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}

Which can be read with

cat /tmp/response.json

3782.25

Jul 09 '24 14:07 jimgar

Love all of this. Thank you! Love the idea of starting with Rocker, especially since that's familiar to a lot of R devs already. Agree on starting with r-ver --- better to use the minimal image to make the startup time quicker. And the project structure makes a lot of sense.

One minor thing is to call the handler something more specific than "handler".

I think what's needed to put this all together might be a vignette that we can add to the package. What do you think?

Jul 14 '24 10:07 mdneuzerling

I wonder if Rocker wouldn't also help with providing support for ARM/AWS Graviton? See: https://github.com/mdneuzerling/lambdr/issues/31#issuecomment-2129569289

Jul 14 '24 10:07 mdneuzerling

Hey @mdneuzerling! Glad you like it. I've since worked on this to make a more "realistic" example project that has everything (I think) that we've been wondering about, and I was hoping could form the basis of one or more vignettes. For the time being I've put it up on my own account here. Haven't got around to filling in the readmes yet.

The gist of it is:

It is a Lambda function called flags that queries the REST Countries API for a given country name and returns information about the flag
Has a production Dockerfile and a development Dockerfile.dev
- The dev one is based on the equivalent r-ver devcontainer image, so it has a bunch of useful R packages already installed, plus Python and the radian R terminal
Has local testing

Having a separate dev and prod Dockerfile opened up some possibilities:

The dev one can be optionally used for a VS Code devcontainer
- There's a devcontainer.json included so that this is trivial to run
Or there's a "build script" for the dev Dockerfile if the user prefers
- Builds the image and runs the container
In both cases the development container will mount the volumes containing the R scripts, so that there's no need to keep rebuilding the container all the time to reflect changes made to code. Also mounts AWS creds (though this might need tweaking for Windows peeps) so that your profile can be used to access AWS resources from within the container
The dev Dockerfile installs the Lambda Runtime Interface Emulator, which enables us to stand up a HTTP endpoint to locally invoke the function for testing
- There's a couple of scripts for this in the local-testing folder

I also have a couple of TypeScript files in .infrastructure/. These are real examples from a minimal CDK stack that I made to deploy the flags function. I thought having these available alongside the lambda project itself could be useful for people that are working with infrastructure as code (like me!) and have never seen just how minimally you can start out.

Obviously it brings a lot of implications of its own - TypeScript, CDK, etc. - but the AWS CDK 'getting started' docs are actually pretty good so I think we could just link to them for most context. The fact is, providing even one detailed deployment example either with IaC or ClickOps or whatever quickly becomes a large undertaking due to the amount of context and steps required. You've experienced this already with your old blog posts right? So maybe we could simply provide the key files, give high-level context, and link to relevant material.

One minor thing is to call the handler something more specific than "handler".

I learned at work that there's at least one camp of people who think we should always call it handler. It serves a specific purpose and the name reflects that. Lambda functions themselves get named, there's only ever one handler per Lambda function, and Lambda functions are all separate from one another. My Python-based colleagues say this is more in keeping with Python and C and stuff where handler is almost like the equivalent to "main".

It makes sense to me, and I think makes the purpose of the handler a bit clearer for Lambda newbies - I do remember being confused about the whole parity thing when I first started out. That said, I'm rather ignorant on the matter and it isn't my package 😆 So we can definitely use a more descriptive name if you prefer.

I wonder if Rocker wouldn't also help with providing support for ARM/AWS Graviton?

Almost certainly. I can't remember which specific architecture the containers in my example repo contain at the moment TBH. It's something we could revisit once the new vignettes/docs are a more together?

In the meantime are you happy for me to start plugging away at some documentation material?

Jul 15 '24 08:07 jimgar

I learned at work that there's at least one camp of people who think we should always call it handler. It serves a specific purpose and the name reflects that.

That makes sense. Let's keep it handler to make it obvious. The only reason I'd suggest otherwise is because it lets people know that you can have multiple functions in one container, but I don't think we need to aim the docs and examples at advanced use cases.

In the meantime are you happy for me to start plugging away at some documentation material?

I would be so grateful for any and all documentation and examples! When I started this package I knew that it would be 10% code, 40% tests, and 50% documentation. Do you think vignettes are the way to go? I'm flexible.

Jul 15 '24 09:07 mdneuzerling

I was thinking that maybe it might be a mix. Here are the parts of my original proposal from https://github.com/mdneuzerling/lambdr/issues/34 that I think are still relevant.

I think for people like me, who are “not-beginners” at R, but are totally new to Data Engineering, a guide/articles with the following would help make lambdr more accessible [ . . . ] The projects would be reproducible examples explained for newbies. They would actually be a single project, just recycled and presented slightly differently in each article due to the requirements of using a parent/arbitrary Docker image.

I think the proposed general audience is probs still right - intermediate R users who need a wee bit more help with the cloud/engineering context. For the others points...

The basics of how Lambda works and why we need lambdr for R projects (contextualises the need for the package)

I think this one could just be a couple or few sentences added to the existing readme/site homepage. And probably just the 'why', not the 'how'.

A primer on Docker (could just be a few links to existing material)

This one could be an article/vignette that uses one of the Dockerfiles as an illustrated example.

A rough overview of how to turn the code and Dockerfile into a Lambda on AWS You’ve got one method in the docs already, but I would like to provide the outline for a CDK option too

This one could be added to the existing Placing an R Lambda Runtime in a Container article. I think the re-working there is to

Start with the outline of an absolutely minimal project ready for deployment (maybe my flags one), then build up extra concepts

Arbitrary/non-provided base images vs provided base images
Dev vs deployment setup
Local testing
Deployment via ClickOps or CDK

This way the article becomes a more fleshed out 'whole game' reference.

Regarding AWS provided base images... To be honest, I find the al2023 image to be horrendous. I do have an example of how to set it up for deployment, but the build time was unpleasant, about 10 to 20 minutes depending on the user, and it would freeze 8GB RAM M1 MacBooks when used as a devcontainer. Also, I hate the fact the distro is bespoke rather than a mainline one like Ubuntu. How do we confidently install binaries? It takes {pak} off the table too because it can't identify system dependencies for anything other than the common Linux distros.

Now that I understand provided images like al2023 are completely optional, I kind of want to direct people away from using them, or at least encourage alternatives like those provided by Rocker. On that subject, while al2 was a better base image it is facing end of life next summer 2025-06-30. To me, it feels a bit close to continue recommending it. I don't know -- what are your thoughts on this?

Jul 15 '24 14:07 jimgar

You've convinced me! I'm happy to swap from the Amazon Linux examples to Rocker examples. Especially if we can install the stuff required for testing Lambda functions locally.

Two things that we should encourage:

using a minimal Rocker image, like r-ver, and
locking down the specific version of the Docker image when something is ready for production.

Jul 21 '24 01:07 mdneuzerling

lambdr lambdr copied to clipboard

Another, more explicit example

R/functions.R

R/startup/aws-lambda.R

Example project

Project structure

Dockerfile

R/functions.R

R/startup/aws-lambda.R

lambdr
lambdr copied to clipboard