llm
llm copied to clipboard
Embeddings need to handle truncation automatically (maybe via extra options)
I frustrating thing about embeddings right now is this:
cd llm
llm embed-multi --files docs/ '*.md' -d /tmp/docs.db docs
This fails with a messy error:
...
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/openai/resources/embeddings.py", line 128, in create
return self._post(
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/openai/_base_client.py", line 1242, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/openai/_base_client.py", line 919, in request
return self._request(
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/openai/_base_client.py", line 1023, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 17143 tokens (17143 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
That error reformatted:
{
"error": {
"message": "This model's maximum context length is 8192 tokens, however you requested 17143 tokens (17143 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.",
"type": "invalid_request_error",
"param": null,
"code": null
}
}
NO embeddings are stored in the /tmp/docs.db database. The entire operation just fails, with no obvious way to recover.
Options:
- Ignore the strings that are too long, embed everything else, show a warning about the ones that were skipped
- Truncate the strings that are too long and embed them anyway
- Have an option that controls if too-long strings are truncated, ignored or cause an error
One challenge here is that I don't know how to truncate strings, and it will vary from model to model.
In this case the OpenAI embedding model has a length of "8192 tokens" - does that mean we need to depend on https://github.com/openai/tiktoken in order to implement truncation?
If we DO need to include that library I'd prefer to have it a dependency of https://github.com/simonw/llm-openai-plugin/ so that llm itself can be installed without needing tiktoken as well.
This is a nasty situation since an embedding failure could raise any potentially type of exception and, as you mention, there's no great way to know what truncation would entail.
You could imagine the embedding interface requiring some sort of truncation method be implemented, which would be viable (if annoying) for the common API provided models, but would be basically impossible to dynamically implement for something like ollama/sentence-transformers, where you have no good way to dynamically figure out how to truncate for an arbitrary model.
Maybe embedding-model could support an optional truncation method that embedding model plugins can choose to support. When needed, it's automatically invoked if it exists. Otherwise, skip the too-large item and log it to the user?