Verba No relevant chunks when attempting to chat with uploaded PDFs

No relevant chunks when attempting to chat with uploaded PDFs

Open mike-mathieu opened this issue 6 months ago • 0 comments

Description

I am unable to retrieve many (any?) relevant chunks when attempting to chat with uploaded PDFs.

I have imported about 20 PDFs, each are roughly 20 pages of text. The importing goes well and I can see the text has been extracted properly. However, when I prompt the chat it retrieves chunks from a few different documents but the context that is pulled from the documents is not very relevant to the prompt. Note, that I believe these are fairly basic prompts that I would expect it to be able to search easily (but maybe I am naive). Is there anything that should be tweaked in the codebase, especially with regards to the Embedder or Retriever.

If this is simply a function of try different settings/configs please let me know as I am new to RAG 🙏 , but it feels like it should work at least a little more reliably than this for some basic prompts. Thank you in advance.

Things I have tried:

changing chunk size to 512 and overlap to 100
changing chunk size to 250 and overlap to 50
hardcoding the OllamaGenerator context_window size in the repo from 10000 -> 100000

Is this a bug or a feature?

[ ] Bug
[ ] Feature

Steps to Reproduce

Basic setup using pip install or repo clone.

.env -> OLLAMA_URL=http://localhost:11434 OLLAMA_MODEL=llama3.1:latest OLLAMA_EMBED_MODEL=mxbai-embed-large:latest

(^ Note that these all appear correctly imported in the OVERVIEW)

Import ~20 PDFs that contain ~20 pages of text each.

Ask a question in chat about the documents.

Additional context

Screenshot of chat example: (Note there are probably a dozen references throughout the PDFs to "310 Second Street")

Aug 15 '24 03:08 mike-mathieu

Verba Verba copied to clipboard

No relevant chunks when attempting to chat with uploaded PDFs

Description

Is this a bug or a feature?

Steps to Reproduce

Additional context

Verba
Verba copied to clipboard