bug: Document Content Dropped by Some Providers When Using UserContent::Document
- [X] I have looked for existing issues (including closed) about this
Bug Report
There's an issue with how document content is handled across different LLM providers. Currently, when using UserContent::Document for document content, the behavior varies inconsistently by provider:
- Ollama: Completely drops document content from messages sent to the model
- Anthropic: Processes the Document as a PDF and errors out
- OpenAI: Correctly converts it to text content and works as expected
This bug was introduced in commit 2d45ad52f61dc8e21ab1e6fc08fe0096f3167ebf
Reproduction
- Run the rag_ollama.rs example
- Observe that document content is missing from the chat payload sent to the model
Expected behavior
The LLM should receive the document content as context so it can properly answer questions based on that context. Currently, with Ollama, this doesn't happen - the documents are processed but never included in the payload.
Proposed fix
Ensure all providers properly handle UserContent::Document by updating their TryFrom implementations to check for document content and convert it appropriately based on the DocumentMediaType.
For example, in Ollama's implementation, document content is currently dropped here:
match uc {
crate::message::UserContent::Text(t) => texts.push(t.text),
crate::message::UserContent::Image(img) => images.push(img.data),
_ => {} // Document content is dropped here
}
And Anthropic always processes the Document as a PDF regardless of the actual content type:
message::UserContent::Document(message::Document { data, format, .. }) => {
let source = DocumentSource {
data,
media_type: DocumentFormat::PDF,
r#type: match format {
Some(format) => format.try_into()?,
None => SourceType::BASE64,
},
};
Ok(Content::Document { source })
}
Temporary workaround: Our current fix is to use UserContent::Text instead of UserContent::Document in the normalized_documents() method, but the right approach would be for all providers to handle document content correctly.
Hi @hollygrimm, thanks for opening this PR!
re: ollama - do you have any examples of text you are sending through (or the types of documents you are sending)? The reason why we have disabled documents for ollama is that in their traditional format, they're not actually supported. So some pre-processing would need to be done.
re: Anthropic - yes, this looks totally wrong. Will be getting on top of this.
Amendment: It would seem that Anthropic seems to only support 4 types of images (JPG, PNG, GIF and WEBP) as well as PDF when it comes to using their API. I'll make some changes to enforce this so that it's obvious what types are/n't supported by Claude for now. WRT ollama, my previous point still stands.
References:
The ollama provider also drops ToolResults. This is pretty obvious if you look at the code, but wasn't mentioned in the issue description.
The ollama provider also drops
ToolResults. This is pretty obvious if you look at the code, but wasn't mentioned in the issue description.
See #477 - hoping to get this merged before next release as it's a pretty big usability bug that was missed
I'm using rig-core with Ollama and spent a good long while trying to figure out why the rag-ollama.rs code copied verbatim wasn't working. Debugging shows the embeddings being processed but the actual prompt just doesn't contain the RAG text from the documents.
I'm still newish to Rust and LLMs, @hollygrimm can you share a code snippet for how I might get the rag-ollama.rs example to work apart from injecting the content into my prompt manually? It sounds like you might have found a workaround