haystack
haystack copied to clipboard
feat: Amazon SageMaker expand inference input format
Related Issues
Thank you to the contributors who worked on #5155. One of the bullet points in the Notes for the review section of #5155 is
The request format is tight to Hugging Face models hosted on SageMaker (we might in the future consider, adding a customization option to allow any other, arbitrary model format)
I would like to discuss and work on how Haystack can support other inference inputs to models hosted on Amazon SageMaker.
Proposed Changes:
- Decouple model parameters from inference layer
- Add model parameters for AI21 Jurassic 2 Complete API
- Add model parameters for AI21 Contextual Answers API
- Add inference layer for AI21 Jurassic 2 Complete hosted on Amazon SageMaker
- Add inference layer for AI21 Contextual Answers hosted on Amazon SageMaker
How to use it?
- Deploy AI21 Jurassic 2 Complete or AI21 Contextual Answers model with Amazon SageMaker JumpStart
- https://aws.amazon.com/blogs/machine-learning/use-proprietary-foundation-models-from-amazon-sagemaker-jumpstart-in-amazon-sagemaker-studio/
- https://github.com/AI21Labs/SageMaker
- Initialize and run PromptNode for the AI21 model running on Amazon SageMaker
# Initialize the node using AI21 Jurassic 2 Complete with endpoint name from Amazon SageMaker:
prompt_node = PromptNode(model_name_or_path="j2-mid",
model_kwargs={"aws_access_key_id": os.getenv("AWS_ACCESS_KEY_ID"),
"aws_secret_access_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"aws_session_token": os.getenv("AWS_SESSION_TOKEN"),
"aws_region_name": "us-east-1"})
prompt="""Write an engaging product description for a clothing eCommerce site. Make sure to include the following features in the description.
Product: Humor Men's Graphic T-Shirt.
Features:
- Soft cotton
- Short sleeve
- Have a print of Einstein's quote: "artificial intelligence is no match for natural stupidity”
Description:
"""
res = prompt_node(prompt, maxTokens=100, temperature=0.9, numResults=3)
or for AI21 Contextual Answers:
# Initialize the node using AI21 Contextual Answers with endpoint name from Amazon SageMaker:
prompt_node = PromptNode(model_name_or_path="contextual-answers", max_length=256,
model_kwargs={"aws_access_key_id": os.getenv("AWS_ACCESS_KEY_ID"),
"aws_secret_access_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"aws_session_token": os.getenv("AWS_SESSION_TOKEN"),
"aws_region_name": "us-east-1"})
context = "The tower is 330 metres (1,083 ft) tall,[6] about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest human-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure in the world to surpass both the 200-metre and 300-metre mark in height. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."
question="What is the height of the Eiffel tower?"
res = prompt_node(question,context=context, question=question)
How did you test it?
- Applied the same unit and integration tests as #5155
Notes for the reviewer
- Recreating the models parameters for every model is probably no the most scalable solution to expand to different models. An alternative could be to not check the input in the inference layer to allow compatibility with any model hosted on Amazon SageMaker. Let's discuss this an other potential designs. 😊
Checklist
- I have read the contributors guidelines and the code of conduct
- I have updated the related issue with new insights and changes
- I added unit tests and updated the docstrings
- I've used one of the conventional commit types for my PR title:
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
. - I documented my code
- I ran pre-commit hooks and fixed any issue
Hello, @malte-aws, and thanks for the contribution!
The SageMaker support has been heavily reworked and expanded in #5205.
As you can see, now there are several conflicts. Could you rebase your PR to the main?
Hi, @anakin87 I replayed the changes from main.
I still have to do work on the documentation of this PR. Before we dive into the implementation details though, could we please discuss the general design direction first.
Do you have thoughts on my Notes for the reviewer? Is it the best design direction to add invocation layers for different models to which #5205 does not apply because they are not standardized HF models?
Pull Request Test Coverage Report for Build 5449022368
- 0 of 0 changed or added relevant lines in 0 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage decreased (-0.04%) to 43.876%
Totals | |
---|---|
Change from base Build 5444867374: | -0.04% |
Covered Lines: | 10149 |
Relevant Lines: | 23131 |
💛 - Coveralls
@malte-aws, thanks for these contributions. There seem to be several topics for discussion here. We could continue discussing here or arrange a quick sync call where @anakin87, you and I can debate options and agree on the best direction forward. Let us know what you would prefer.
@malte-aws, thanks for these contributions. There seem to be several topics for discussion here. We could continue discussing here or arrange a quick sync call where @anakin87, you and I can debate options and agree on the best direction forward. Let us know what you would prefer.
Hi @vblagoje, Sounds good. I am available for a call today any time after 6 PM UTC or tomorrow after 3 PM UTC.
@malte-aws, thanks for these contributions. There seem to be several topics for discussion here. We could continue discussing here or arrange a quick sync call where @anakin87, you and I can debate options and agree on the best direction forward. Let us know what you would prefer.
Hi @vblagoje, Sounds good. I am available for a call today any time after 6 PM UTC or tomorrow after 3 PM UTC.
Yes, let's go for 3pm UTC tomorrow @malte-aws . What's the best way to send you the meeting details?
@malte-aws, thanks for these contributions. There seem to be several topics for discussion here. We could continue discussing here or arrange a quick sync call where @anakin87, you and I can debate options and agree on the best direction forward. Let us know what you would prefer.
Hi @vblagoje, Sounds good. I am available for a call today any time after 6 PM UTC or tomorrow after 3 PM UTC.
Yes, let's go for 3pm UTC tomorrow @malte-aws . What's the best way to send you the meeting details?
I have sent you an email.
Hi @anakin87,
How are you?
I started to implement the following changes that we discussed in the sync:
- Consolidate invocation layers for AI21 models on Amazon SageMaker into one class so that the algorithm that finds the correct invocation layer does not need to iterate through 7 invocation layers alone for the different AI21 model APIs.
- I also removed the parameter filter and now the invocation layer sends all the parameters that a developer puts into the request to the inference endpoint.
Could you please review briefly if this implementation is the way we want to go forward. If you give me a green light then I will finish & test the implementation for the other AI21 models available.
Kind regards Malte
Hello, @malte-aws,
me and @vblagoje took a look at the recent changes and they seem to be going in the right direction.
(I'm still not 100% sure about some of the implementation details.)
But I think you can go ahead, add the tests, and then rely on the CI (mypy, existing tests...) and our future reviews to further validate the approach!
Closing for inactivity, please feel free to re-open if you resume this work.