langchain-aws Support Bedrock Batch Inference

Add support for Bedrock Batch Inference when using BedrockLLM batch() instead of making calls with the sync API

Oct 22 '24 19:10 bharven

@bharven Can you provide more info about the batch API support in Bedrock, can you share the documentation for this feature and if it is supported in the current boto3 sdk.

Oct 23 '24 00:10 3coins

@3coins expected behavior would be to use the native bedrock batch inference capability instead of using sync API calls where possible. Bedrock batch currently requires >= 1000 examples in a job, ideally the batch() call would use sync API for < 1000 records, and batch api for >= 1000 records. Bedrock batch enables processing of large (10k+) datasets without worrying about getting rate-limited, etc. It is supported in the current boto3 sdk.

AWS Documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference.html

Boto3 SDK link: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock/client/create_model_invocation_job.html

Oct 23 '24 00:10 bharven

@bharven Thanks for providing the documentation for the batch API in Bedrock. Some considerations that come to mind for this implementation.

The create_model_invocation_job uses the bedrock service, not the bedrock-runtime that converse or invoke APIs do, which means we will need a second boto3 client for supporting batch.
From the documentation, it is not clear if the data (messages) should be in native model format that each model supports or what the converse API supports. This is important because we will need to convert the input messages (LangChain messages) to the format that Bedrock batch supports.

To keep compatibility with the LangChain messages as inputs, we should only support the messages part of the input from the payload. For example, we won't be able to support recordId directly, but can embed this in the message id and then reformat to recordId when we send data to S3.

{
    "recordId": "CALL0000001", 
    "modelInput": {
        "anthropic_version": "bedrock-2023-05-31", 
        "max_tokens": 1024,
        "messages": [ 
            { 
                "role": "user", 
                "content": [
                    {
                        "type": "text", 
                        "text": "Summarize the following call transcript: ..." 
                    } 
                ]
            }
        ]
    }
}

There are many configuration inputs/outputs required for the batch job, and so we should use the kwargs parameter of the batch method to accept these inputs except the messages.
The Bedrock batch API is a long running non-sync API. How do you plan to implement fetching the results and polling for the job status within the batch method?

Oct 24 '24 20:10 3coins

Hey, guys. Not just for Bedrock, but for Groq and OpenAI

Feb 06 '25 19:02 nietzscheson

Hi guys any news ? will this be added? it has been a feature of aws bedrock for a while and it could lower significantly the cost for jobs that do not require real-time / interactive responses.

Sep 25 '25 08:09 LeanVel

langchain-aws langchain-aws copied to clipboard

Support Bedrock Batch Inference

langchain-aws
langchain-aws copied to clipboard