bedrock-claude-chat [Feature Request / Question] Why using the full input for RAG with cohere instead of making a condensed response

[Feature Request / Question] Why using the full input for RAG with cohere instead of making a condensed response

Open jeremylatorre opened this issue 1 year ago • 2 comments

Describe the solution you'd like

When using a custom bot, it implies that the maximum context window is reduced to 2048 because it's fully managed by Cohere. Why don't use Claude3 to reduce the input question to 2048 token maximum to avoid having errors on client side when the user set a large input. And then send the reduced question to Cohere.

I've already discussed this with an AWS SA in France who agreed with this.

Why the solution needed

Avoid having errors when using the RAG for users with large context windows

Additional context

Implementation feasibility

Are you willing to discuss the solution with us, decide on the approach, and assist with the implementation?

[ X] Yes
[ ] No

Jun 04 '24 14:06 jeremylatorre

I think we could use haiku to limit the cost of rephrasing when using RAG

Jun 05 '24 08:06 jeremylatorre

I agree, but the response will be delayed. This feature should be optional.

Jun 06 '24 01:06 statefb

bedrock-claude-chat bedrock-claude-chat copied to clipboard

[Feature Request / Question] Why using the full input for RAG with cohere instead of making a condensed response

Describe the solution you'd like

Why the solution needed

Additional context

Implementation feasibility

bedrock-claude-chat
bedrock-claude-chat copied to clipboard