unstract icon indicating copy to clipboard operation
unstract copied to clipboard

fix: [PDF document with 81 pages being indexed into 1 node in Qdrant and Postgres, missing 99% of the document after "successfully indexed"]

Open Seth-Peters opened this issue 1 year ago • 3 comments

Describe the bug

When trying the community version, after connecting successfully an Azure LLM, Qdrant connection, and Llamaparse connection, I have tested by uploading a single document and clicking "index". It shows that it has successfully indexed the document, but only with "1 node". Upon further investigating, the Qdrant vector db has only a single indexed node with only the first title page text of the document. No other parts of the document are indexed.

To reproduce

Using Azure LLM, llamaparse, and Qdrant, then uploading a PDF with chunk_size = 1024 and overlap = 128 then pressing index.

Expected behavior

I would expect to see thousands of nodes in my Qdrant vector db of the successfully parsed/split document.

Environment details

  • Version: v0.101.6

Screenshots

Full log:

image

Parsing nodes: 100% 1/1:

image

Qdrant collection with 1 point:

image

EDIT: Signed up for the unstract cloud free version, same issue there. It only indexes the first few characters of my document. I have checked that the llamaparse API works fine with my document.

Screenshot of the unstract cloud:

chunks used button:

image

Seth-Peters avatar Dec 30 '24 09:12 Seth-Peters

@Seth-Peters could you try with the llmwhisperer free version once to confirm if this issue is happening only with the llamaparse?

ritwik-g avatar Dec 31 '24 06:12 ritwik-g

@ritwik-g - it works with the LLM whisperer. Not sure what is happening, as I did check the document itself works in my llamaparse playground (with my account/api key there).

Seth-Peters avatar Dec 31 '24 06:12 Seth-Peters

Would love more feedback on this specific issue. I'm currently running into the same problem. It looks like there is some processing with page Seperators from the result from LlamaParse (i.e. "---")

ghopkins-lurin avatar Jul 10 '25 15:07 ghopkins-lurin