bedrock-claude-chat
bedrock-claude-chat copied to clipboard
[Feature Request / Question] Why using the full input for RAG with cohere instead of making a condensed response
Describe the solution you'd like
When using a custom bot, it implies that the maximum context window is reduced to 2048 because it's fully managed by Cohere. Why don't use Claude3 to reduce the input question to 2048 token maximum to avoid having errors on client side when the user set a large input. And then send the reduced question to Cohere.
I've already discussed this with an AWS SA in France who agreed with this.
Why the solution needed
Avoid having errors when using the RAG for users with large context windows
Additional context
Implementation feasibility
Are you willing to discuss the solution with us, decide on the approach, and assist with the implementation?
- [ X] Yes
- [ ] No
I think we could use haiku to limit the cost of rephrasing when using RAG
I agree, but the response will be delayed. This feature should be optional.