chalice icon indicating copy to clipboard operation
chalice copied to clipboard

AWS Batch Support

Open eganjs opened this issue 5 years ago • 13 comments

One of the ways I would like to use chalice is for longer running serverless computations. This fits quite nicely into AWS Batch when your computations aren't splitable through a chain of writes and event triggered Lambdas.

This could be done by creating an interface to Batch that mimics the Lambda interface, but consequentially would not be able to return a response. This would probably involve creating a number of boilerplate Lambdas that trigger and pass on parameters to the Batch job and in the case of HTTP, responds to callers with a 201 response once the job has been scheduled.

Additionally, a Docker container would have to be referenced or built, as that is a requirement of AWS Batch. In order to make the interface as seamless as possible I would favour building an image and uploading it to ECR. I also believe the idea to bundle chalicelib (and potentially other directories) as layers in #1001 would complement this very nicely as it would enable larger deployments to Batch (through adding files to the Docker image) while preserving the ability to deploy the same content to Lambdas (through layers).

I am majorly splitballing in this post, so any corrections or ideas to refine this would be appreciated as I think this would be an amazing feature!

eganjs avatar Jun 11 '20 11:06 eganjs

To make sure I understand this, are you suggesting you want to use the python API that chalice provides, but instead of deploying to Lambda, have it create batch jobs for you? Or when you say:

This would probably involve creating a number of boilerplate Lambdas that trigger and pass on parameters to the Batch job and in the case of HTTP

Do you mean you want chalice to create Lambda functions like it normally does but have it automatically trigger a Batch job when invoked?

jamesls avatar Jun 22 '20 17:06 jamesls

My intention was for the later.

For example, code might look something like this:

import json

from chalice import Chalice

app = Chalice(app_name='helloworld')


@app.batch_route(
    '/submit',  # POST method only?
    vcpus=32, memory=160000, timeout=36000,
    job_queue="my_job_queue"
)
def calculate():
    body = json.dumps(app.current_request.json_body)
    ...  # some long running computation

In this case, API Gateway, Lambda and Batch Job resources would be created. The Lambda function would be boilerplate code that, as you've described, would trigger the generated Batch Job when invoked.

eganjs avatar Jun 22 '20 21:06 eganjs

Got it, thanks for clarifying. Labeling as a feature request. If anyone else is interested in this, please +1 this issue.

jamesls avatar Jun 24 '20 15:06 jamesls

👍 (+1) Not sure if you meant for me to do it some other way

tomaszps avatar Nov 12 '20 20:11 tomaszps

I really support this idea

tenebrius avatar Feb 01 '21 16:02 tenebrius

+1

system123 avatar Feb 18 '21 10:02 system123

+1

hitrust avatar Feb 22 '21 23:02 hitrust

+1

dmb107 avatar Jun 07 '23 19:06 dmb107

+1

pedro-valentim avatar Nov 07 '23 16:11 pedro-valentim

+1

cjjcastro avatar Feb 09 '24 21:02 cjjcastro

+1

ewong26 avatar Mar 26 '24 22:03 ewong26

+1

mathadoor avatar Apr 14 '24 20:04 mathadoor

@jamesls if an ambitious contributor were to want to implement this, how might you suggest they approach adding this feature?

For example: would a limited implementation of say @app.batch_job() that only solved the packaging & deployment complexity be acceptable as an initial scope?

e.g.

import json

from chalice import Chalice

app = Chalice(app_name='helloworld')

@app.batch_job(
    vcpus=32, memory=160000, timeout=36000,
    job_definition="my_job_definition",
    job_queue="my_job_queue"
)
def calculate():
    body = json.dumps(app.current_request.json_body)
    ...  # some long running computation

The Batch Job would only be executable via manual calls to submit-job:

aws batch submit-job --job-name my_job \
    --job-queue my_job_queue \
    --job-definition my_job_definition \
    --retry-strategy attempts=2

Would be a solid MVP for the feature and allow for folks to see if its useful before having to decide how job submission should/must interop with other scheduling decorators like on_s3_event, schedule, etc.

indexzero avatar Jun 11 '24 06:06 indexzero