langchain
langchain copied to clipboard
Added support for Amazon SageMaker Asynchronous Endpoints
Description
Added support for Amazon SageMaker Asynchronous Endpoints.
Issue
#6928
Dependencies
boto3, uuid
Maintainer
@hwchase17
Twitter handle
DGallitelli95
Code to test this
from typing import Dict
from langchain import PromptTemplate
from langchain.llms.sagemaker_endpoint import LLMContentHandler, SagemakerAsyncEndpoint
from langchain.chains import LLMChain
# This ContentHandler has been tested with Falcon40B Instruct from SageMaker JumpStart
class ContentHandler(LLMContentHandler):
content_type:str = "application/json"
accepts:str = "application/json"
len_prompt:int = 0
def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
self.len_prompt = len(prompt)
input_str = json.dumps({"inputs": prompt, "parameters": {"max_new_tokens": 100, "do_sample": False, "repetition_penalty": 1.1}})
return input_str.encode('utf-8')
def transform_output(self, output: bytes) -> str:
response_json = output.read()
res = json.loads(response_json)
ans = res[0]['generated_text']
return ans
chain = LLMChain(
llm=SagemakerAsyncEndpoint(
input_bucket=bucket,
input_prefix=prefix,
endpoint_name=my_model.endpoint_name,
region_name=sagemaker.Session().boto_region_name,
content_handler=ContentHandler(),
),
prompt=PromptTemplate(
input_variables=["query"],
template="{query}",
),
)
print(chain.run("What is the purpose of life?"))
The latest updates on your projects. Learn more about Vercel for Git โ๏ธ
1 Ignored Deployment
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| langchain | โฌ๏ธ Ignored (Inspect) | Visit Preview | Oct 10, 2023 6:38am |
PR Analysis
- ๐ฏ Main theme: Adding support for Amazon SageMaker Asynchronous Endpoints
- ๐ Description and title: Yes
- ๐ Type of PR: Enhancement
- ๐งช Relevant tests added: No
- โจ Minimal and focused: Yes
PR Feedback
-
๐ก General PR suggestions: The PR is well-structured and follows good coding practices. However, it lacks tests for the new functionality. It would be beneficial to add tests to ensure the new code works as expected and to prevent potential regressions in the future. Additionally, consider handling exceptions more gracefully, providing more informative error messages to the user.
-
๐ค Code suggestions:
-
suggestion 1:
- relevant file: sagemaker_endpoint.py
- suggestion content: Consider moving the
wait_inference_filefunction inside theSagemakerAsyncEndpointclass as a static or class method. This would improve the organization of the code and make it more object-oriented. [important]
-
suggestion 2:
- relevant file: sagemaker_endpoint.py
- suggestion content: In the
wait_inference_filefunction, instead of using a while True loop, consider using a for loop with a maximum number of retries to avoid potential infinite loops. [important]
-
suggestion 3:
- relevant file: sagemaker_endpoint.py
- suggestion content: In the
__init__method ofSagemakerAsyncEndpoint, consider validating the input arguments to ensure they are in the correct format and within expected ranges. This can help catch errors early and provide clearer error messages to the user. [medium]
-
suggestion 4:
- relevant file: sagemaker_endpoint.py
- suggestion content: In the
_callmethod ofSagemakerAsyncEndpoint, consider handling the exception when the endpoint is not running. Instead of raising a generic Exception, provide a more specific error message to the user. [medium]
-
๐ Security concerns: No
Add a comment that says 'Please review' to ask for a new review after you update the PR. Add a comment that says 'Please answer <QUESTION...>' to ask a question about this PR.
Please review. Updated PR with commit: 703081c4dee43c5c47801c28f171f61a684038a0
Change notes:
- Reverted changes to
sagemaker_endpoint.pyfile - Moved changes to
sagemaker_async_endpoint.pyfile
Please review
Requesting Langchain team's review and approval for PR to add support for Sagemaker async endpoints. Let's leverage these cost-effective endpoints for enhanced performance! I am willing to support for any testing and validation. ๐ค Thank you!
cc @3coins
@dgallitelli Thanks for submitting this PR. It seems like the async endpoints are specifically geared towards large payloads (1GB) and long processing times. Is this something ideal for an LLM application? I am curious about what specific use cases will this integration help with?
How does this compare with SageMaker serverless inference which might also have the cost benefits?
@dgallitelli Thanks for submitting this PR. It seems like the async endpoints are specifically geared towards large payloads (1GB) and long processing times. Is this something ideal for an LLM application? I am curious about what specific use cases will this integration help with?
How does this compare with SageMaker serverless inference which might also have the cost benefits?
If I may add my comment, the integration of AWS Sagemaker async endpoints holds promise for Langchain's LLM application. The support for GPU instances in async endpoints addresses the size and complexity of LLM models, unlike Sagemaker serverless inference which is limited to CPU instances. This distinction is significant for achieving optimal performance. Moreover, the cost-effectiveness achieved through scaling down to zero during idle periods enhances the feasibility of using AWS Sagemaker async.
@dgallitelli Hi , could you, please, resolve the merging issues and address the last comments (if needed)? After that, ping me and I push this PR for the review. Thanks!
@dgallitelli Hi , could you, please, resolve the merging issues and address the last comments (if needed)? After that, ping me and I push this PR for the review. Thanks!
Hi @leo-gan, the latest commit has passed all checks, and we have reviewed the previously described comments. Shall we move forward with PR review?
@dgallitelli Hi, could you, please, resolve the the branch conflicts? After that ping me and I push this PR for the review. Thanks!
It tells me I don't have write access to this repository though?
Hey @dgallitelli ! Closing because the PR wouldn't line up with the current directory structure of the library (would need to be in /libs/langchain/langchain instead of /langchain). Feel free to reopen against the current head if it's still relevant!
You don't need write access on langchain-ai/langchain in order to modify your branch. If you update it and reopen without conflicts, we'd love to review it! Apologies if the Github write access message was confusing.