rag_api
rag_api copied to clipboard
refactor: Async batch processing, limits, and configuration
Added async api calls to embed (process) and add chunks (documents) to vector storage.
Added env variables: MAX_CHUNKS, EMBEDDING_TIMEOUT, BATCH_SIZE, and CONCURRENT_LIMIT to configure/limit batching api class. Description in Readme.
Tested with PGVector with bedrock and amazon.titan-embed-text-v2:0. Also rebuilt in docker successfully.
- Verified
BATCH_SIZEandCONCURRENT_LIMITaffecting process calls to embed correctly - Verified
MAX_CHUNKSworks as limit and throws exception if over. Correct default behavior of ignoring - Verified
EMBEDDING_TIMEOUTworks within limit and throws exception if over
Ideal BATCH_SIZE varies on embeddings provider and likely model and file size. Specific default values should be added per embeddings provider in future. Right now the default for BATCH_SIZE=75 which is ideal for openai but not bedrock.