langchain-aws
langchain-aws copied to clipboard
Support Bedrock Batch Inference
Add support for Bedrock Batch Inference when using BedrockLLM batch() instead of making calls with the sync API
@bharven Can you provide more info about the batch API support in Bedrock, can you share the documentation for this feature and if it is supported in the current boto3 sdk.
@3coins expected behavior would be to use the native bedrock batch inference capability instead of using sync API calls where possible. Bedrock batch currently requires >= 1000 examples in a job, ideally the batch() call would use sync API for < 1000 records, and batch api for >= 1000 records. Bedrock batch enables processing of large (10k+) datasets without worrying about getting rate-limited, etc. It is supported in the current boto3 sdk.
AWS Documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference.html
Boto3 SDK link: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock/client/create_model_invocation_job.html
@bharven Thanks for providing the documentation for the batch API in Bedrock. Some considerations that come to mind for this implementation.
-
The
create_model_invocation_jobuses thebedrockservice, not thebedrock-runtimethatconverseorinvokeAPIs do, which means we will need a second boto3 client for supporting batch. -
From the documentation, it is not clear if the data (messages) should be in native model format that each model supports or what the
converseAPI supports. This is important because we will need to convert the input messages (LangChain messages) to the format that Bedrock batch supports. -
To keep compatibility with the LangChain messages as inputs, we should only support the
messagespart of the input from the payload. For example, we won't be able to supportrecordIddirectly, but can embed this in the message id and then reformat to recordId when we send data to S3.{ "recordId": "CALL0000001", "modelInput": { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 1024, "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Summarize the following call transcript: ..." } ] } ] } } -
There are many configuration inputs/outputs required for the batch job, and so we should use the
kwargsparameter of thebatchmethod to accept these inputs except themessages. -
The Bedrock batch API is a long running non-sync API. How do you plan to implement fetching the results and polling for the job status within the batch method?
Hi guys any news ? will this be added? it has been a feature of aws bedrock for a while and it could lower significantly the cost for jobs that do not require real-time / interactive responses.