llm Embeddings need to handle truncation automatically (maybe via extra options)

I frustrating thing about embeddings right now is this:

cd llm
llm embed-multi --files docs/ '*.md' -d /tmp/docs.db docs

This fails with a messy error:

...
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/openai/resources/embeddings.py", line 128, in create
    return self._post(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/openai/_base_client.py", line 1242, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/openai/_base_client.py", line 919, in request
    return self._request(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/openai/_base_client.py", line 1023, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 17143 tokens (17143 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

That error reformatted:

{
  "error": {
    "message": "This model's maximum context length is 8192 tokens, however you requested 17143 tokens (17143 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

NO embeddings are stored in the /tmp/docs.db database. The entire operation just fails, with no obvious way to recover.

May 25 '25 20:05 simonw

Options:

Ignore the strings that are too long, embed everything else, show a warning about the ones that were skipped
Truncate the strings that are too long and embed them anyway
Have an option that controls if too-long strings are truncated, ignored or cause an error

One challenge here is that I don't know how to truncate strings, and it will vary from model to model.

In this case the OpenAI embedding model has a length of "8192 tokens" - does that mean we need to depend on https://github.com/openai/tiktoken in order to implement truncation?

May 25 '25 20:05 simonw

If we DO need to include that library I'd prefer to have it a dependency of https://github.com/simonw/llm-openai-plugin/ so that llm itself can be installed without needing tiktoken as well.

May 25 '25 20:05 simonw

This is a nasty situation since an embedding failure could raise any potentially type of exception and, as you mention, there's no great way to know what truncation would entail.

You could imagine the embedding interface requiring some sort of truncation method be implemented, which would be viable (if annoying) for the common API provided models, but would be basically impossible to dynamically implement for something like ollama/sentence-transformers, where you have no good way to dynamically figure out how to truncate for an arbitrary model.

Maybe embedding-model could support an optional truncation method that embedding model plugins can choose to support. When needed, it's automatically invoked if it exists. Otherwise, skip the too-large item and log it to the user?

May 25 '25 21:05 daturkel

llm llm copied to clipboard

Embeddings need to handle truncation automatically (maybe via extra options)

llm
llm copied to clipboard