chalice
chalice copied to clipboard
AWS Batch Support
One of the ways I would like to use chalice is for longer running serverless computations. This fits quite nicely into AWS Batch when your computations aren't splitable through a chain of writes and event triggered Lambdas.
This could be done by creating an interface to Batch that mimics the Lambda interface, but consequentially would not be able to return a response. This would probably involve creating a number of boilerplate Lambdas that trigger and pass on parameters to the Batch job and in the case of HTTP, responds to callers with a 201 response once the job has been scheduled.
Additionally, a Docker container would have to be referenced or built, as that is a requirement of AWS Batch. In order to make the interface as seamless as possible I would favour building an image and uploading it to ECR. I also believe the idea to bundle chalicelib (and potentially other directories) as layers in #1001 would complement this very nicely as it would enable larger deployments to Batch (through adding files to the Docker image) while preserving the ability to deploy the same content to Lambdas (through layers).
I am majorly splitballing in this post, so any corrections or ideas to refine this would be appreciated as I think this would be an amazing feature!
To make sure I understand this, are you suggesting you want to use the python API that chalice provides, but instead of deploying to Lambda, have it create batch jobs for you? Or when you say:
This would probably involve creating a number of boilerplate Lambdas that trigger and pass on parameters to the Batch job and in the case of HTTP
Do you mean you want chalice to create Lambda functions like it normally does but have it automatically trigger a Batch job when invoked?
My intention was for the later.
For example, code might look something like this:
import json
from chalice import Chalice
app = Chalice(app_name='helloworld')
@app.batch_route(
'/submit', # POST method only?
vcpus=32, memory=160000, timeout=36000,
job_queue="my_job_queue"
)
def calculate():
body = json.dumps(app.current_request.json_body)
... # some long running computation
In this case, API Gateway, Lambda and Batch Job resources would be created. The Lambda function would be boilerplate code that, as you've described, would trigger the generated Batch Job when invoked.
Got it, thanks for clarifying. Labeling as a feature request. If anyone else is interested in this, please +1 this issue.
👍 (+1) Not sure if you meant for me to do it some other way
I really support this idea
+1
+1
+1
+1
+1
+1
+1
@jamesls if an ambitious contributor were to want to implement this, how might you suggest they approach adding this feature?
For example: would a limited implementation of say @app.batch_job() that only solved the packaging & deployment complexity be acceptable as an initial scope?
e.g.
import json
from chalice import Chalice
app = Chalice(app_name='helloworld')
@app.batch_job(
vcpus=32, memory=160000, timeout=36000,
job_definition="my_job_definition",
job_queue="my_job_queue"
)
def calculate():
body = json.dumps(app.current_request.json_body)
... # some long running computation
The Batch Job would only be executable via manual calls to submit-job:
aws batch submit-job --job-name my_job \
--job-queue my_job_queue \
--job-definition my_job_definition \
--retry-strategy attempts=2
Would be a solid MVP for the feature and allow for folks to see if its useful before having to decide how job submission should/must interop with other scheduling decorators like on_s3_event, schedule, etc.