rig
rig copied to clipboard
feat(embeddings-builder): Add max token limit checking when creating emebeddings
- [ ] I have looked for existing issues (including closed) about this
See branch: `fix(embeddings)/tiktoken``
Feature Request
Check the number of tokens in a payload before making a request to a provider to create embeddings.
Motivation
rig embeddings builder erroring out when the number of tokens in a request exceeds the limit. currently, rig checks that the number of documents doesn't exceed the limit but not the number of tokens.
Proposal
Use the tiktoken-rs library to validate the number of tokens. If the number f tokens exceeds the limit, split up the request into multiple requests.