rig icon indicating copy to clipboard operation
rig copied to clipboard

feat(embeddings-builder): Add max token limit checking when creating emebeddings

Open marieaurore123 opened this issue 3 months ago • 0 comments

  • [ ] I have looked for existing issues (including closed) about this

See branch: `fix(embeddings)/tiktoken``

Feature Request

Check the number of tokens in a payload before making a request to a provider to create embeddings.

Motivation

rig embeddings builder erroring out when the number of tokens in a request exceeds the limit. currently, rig checks that the number of documents doesn't exceed the limit but not the number of tokens.

Proposal

Use the tiktoken-rs library to validate the number of tokens. If the number f tokens exceeds the limit, split up the request into multiple requests.

Alternatives

marieaurore123 avatar Oct 30 '24 15:10 marieaurore123