langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Added support for Amazon SageMaker Asynchronous Endpoints

Open dgallitelli opened this issue 2 years ago โ€ข 4 comments
trafficstars

Description

Added support for Amazon SageMaker Asynchronous Endpoints.

Issue

#6928

Dependencies

boto3, uuid

Maintainer

@hwchase17

Twitter handle

DGallitelli95

Code to test this

from typing import Dict
from langchain import PromptTemplate
from langchain.llms.sagemaker_endpoint import LLMContentHandler, SagemakerAsyncEndpoint
from langchain.chains import LLMChain

# This ContentHandler has been tested with Falcon40B Instruct from SageMaker JumpStart
class ContentHandler(LLMContentHandler):
    content_type:str = "application/json"
    accepts:str = "application/json"
    len_prompt:int = 0

    def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
        self.len_prompt = len(prompt)
        input_str = json.dumps({"inputs": prompt, "parameters": {"max_new_tokens": 100, "do_sample": False, "repetition_penalty": 1.1}})
        return input_str.encode('utf-8')

    def transform_output(self, output: bytes) -> str:
        response_json = output.read()
        res = json.loads(response_json)
        ans = res[0]['generated_text']
        return ans

chain = LLMChain(
    llm=SagemakerAsyncEndpoint(
        input_bucket=bucket,
        input_prefix=prefix,
        endpoint_name=my_model.endpoint_name,
        region_name=sagemaker.Session().boto_region_name,
        content_handler=ContentHandler(),
    ),
    prompt=PromptTemplate(
        input_variables=["query"],
        template="{query}",
    ),
)

print(chain.run("What is the purpose of life?"))

dgallitelli avatar Jun 29 '23 21:06 dgallitelli

The latest updates on your projects. Learn more about Vercel for Git โ†—๏ธŽ

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain โฌœ๏ธ Ignored (Inspect) Visit Preview Oct 10, 2023 6:38am

vercel[bot] avatar Jun 29 '23 21:06 vercel[bot]

PR Analysis

  • ๐ŸŽฏ Main theme: Adding support for Amazon SageMaker Asynchronous Endpoints
  • ๐Ÿ” Description and title: Yes
  • ๐Ÿ“Œ Type of PR: Enhancement
  • ๐Ÿงช Relevant tests added: No
  • โœจ Minimal and focused: Yes

PR Feedback

  • ๐Ÿ’ก General PR suggestions: The PR is well-structured and follows good coding practices. However, it lacks tests for the new functionality. It would be beneficial to add tests to ensure the new code works as expected and to prevent potential regressions in the future. Additionally, consider handling exceptions more gracefully, providing more informative error messages to the user.

  • ๐Ÿค– Code suggestions:

  • suggestion 1:

    • relevant file: sagemaker_endpoint.py
    • suggestion content: Consider moving the wait_inference_file function inside the SagemakerAsyncEndpoint class as a static or class method. This would improve the organization of the code and make it more object-oriented. [important]
  • suggestion 2:

    • relevant file: sagemaker_endpoint.py
    • suggestion content: In the wait_inference_file function, instead of using a while True loop, consider using a for loop with a maximum number of retries to avoid potential infinite loops. [important]
  • suggestion 3:

    • relevant file: sagemaker_endpoint.py
    • suggestion content: In the __init__ method of SagemakerAsyncEndpoint, consider validating the input arguments to ensure they are in the correct format and within expected ranges. This can help catch errors early and provide clearer error messages to the user. [medium]
  • suggestion 4:

    • relevant file: sagemaker_endpoint.py
    • suggestion content: In the _call method of SagemakerAsyncEndpoint, consider handling the exception when the endpoint is not running. Instead of raising a generic Exception, provide a more specific error message to the user. [medium]
  • ๐Ÿ”’ Security concerns: No

Add a comment that says 'Please review' to ask for a new review after you update the PR. Add a comment that says 'Please answer <QUESTION...>' to ask a question about this PR.

CodiumAI-Agent avatar Jul 06 '23 06:07 CodiumAI-Agent

Please review. Updated PR with commit: 703081c4dee43c5c47801c28f171f61a684038a0

Change notes:

  • Reverted changes to sagemaker_endpoint.py file
  • Moved changes to sagemaker_async_endpoint.py file

dgallitelli avatar Jul 07 '23 08:07 dgallitelli

Please review

dgallitelli avatar Jul 11 '23 11:07 dgallitelli

Requesting Langchain team's review and approval for PR to add support for Sagemaker async endpoints. Let's leverage these cost-effective endpoints for enhanced performance! I am willing to support for any testing and validation. ๐Ÿค Thank you!

bismillahkani avatar Jul 23 '23 10:07 bismillahkani

cc @3coins

baskaryan avatar Aug 11 '23 00:08 baskaryan

@dgallitelli Thanks for submitting this PR. It seems like the async endpoints are specifically geared towards large payloads (1GB) and long processing times. Is this something ideal for an LLM application? I am curious about what specific use cases will this integration help with?

How does this compare with SageMaker serverless inference which might also have the cost benefits?

3coins avatar Aug 11 '23 19:08 3coins

@dgallitelli Thanks for submitting this PR. It seems like the async endpoints are specifically geared towards large payloads (1GB) and long processing times. Is this something ideal for an LLM application? I am curious about what specific use cases will this integration help with?

How does this compare with SageMaker serverless inference which might also have the cost benefits?

If I may add my comment, the integration of AWS Sagemaker async endpoints holds promise for Langchain's LLM application. The support for GPU instances in async endpoints addresses the size and complexity of LLM models, unlike Sagemaker serverless inference which is limited to CPU instances. This distinction is significant for achieving optimal performance. Moreover, the cost-effectiveness achieved through scaling down to zero during idle periods enhances the feasibility of using AWS Sagemaker async.

bismillahkani avatar Aug 13 '23 04:08 bismillahkani

@dgallitelli Hi , could you, please, resolve the merging issues and address the last comments (if needed)? After that, ping me and I push this PR for the review. Thanks!

leo-gan avatar Sep 18 '23 23:09 leo-gan

@dgallitelli Hi , could you, please, resolve the merging issues and address the last comments (if needed)? After that, ping me and I push this PR for the review. Thanks!

Hi @leo-gan, the latest commit has passed all checks, and we have reviewed the previously described comments. Shall we move forward with PR review?

EliaLesyk avatar Oct 11 '23 12:10 EliaLesyk

@dgallitelli Hi, could you, please, resolve the the branch conflicts? After that ping me and I push this PR for the review. Thanks!

leo-gan avatar Oct 11 '23 16:10 leo-gan

It tells me I don't have write access to this repository though? image

dgallitelli avatar Oct 11 '23 16:10 dgallitelli

Hey @dgallitelli ! Closing because the PR wouldn't line up with the current directory structure of the library (would need to be in /libs/langchain/langchain instead of /langchain). Feel free to reopen against the current head if it's still relevant!

You don't need write access on langchain-ai/langchain in order to modify your branch. If you update it and reopen without conflicts, we'd love to review it! Apologies if the Github write access message was confusing.

efriis avatar Nov 07 '23 03:11 efriis