Roland Haller
Roland Haller
After doing some quick tests on [Tiktoken](https://platform.openai.com/tokenizer), I get the following for OpenAI: 1. Alphabetical languages (English, French, Russian…): Token number = Text * ~24% 2. Japanese and Chinese: Text...
To reduce the guess work, and limit the performance hit, we could process the first chunk of each document through TikToken and use that result as an estimation that would...
Would it be a good idea to have a *split_and_retry* on the [build method](https://github.com/0xPlaygrounds/rig/blob/cd7e7097d393bf46ae8d2bcfc796e3222250d365/rig-core/src/embeddings/builder.rs#L111)? I tried the following (with OpenAI) and it works rather well. That would allow for a...
Seems open to me.