rag_api icon indicating copy to clipboard operation
rag_api copied to clipboard

refactor: Async batch processing, limits, and configuration

Open FinnConnor opened this issue 1 year ago • 1 comments

Added async api calls to embed (process) and add chunks (documents) to vector storage.

Added env variables: MAX_CHUNKS, EMBEDDING_TIMEOUT, BATCH_SIZE, and CONCURRENT_LIMIT to configure/limit batching api class. Description in Readme.

Tested with PGVector with bedrock and amazon.titan-embed-text-v2:0. Also rebuilt in docker successfully.

  1. Verified BATCH_SIZE and CONCURRENT_LIMIT affecting process calls to embed correctly
  2. Verified MAX_CHUNKS works as limit and throws exception if over. Correct default behavior of ignoring
  3. Verified EMBEDDING_TIMEOUT works within limit and throws exception if over

FinnConnor avatar Sep 30 '24 17:09 FinnConnor

Ideal BATCH_SIZE varies on embeddings provider and likely model and file size. Specific default values should be added per embeddings provider in future. Right now the default for BATCH_SIZE=75 which is ideal for openai but not bedrock.

FinnConnor avatar Sep 30 '24 17:09 FinnConnor