llm-graph-builder Issue: LLM Processing Stops Early on Large PDFs

Issue: LLM Processing Stops Early on Large PDFs – Misses Values from Later Pages

Open nkolonne opened this issue 7 months ago • 4 comments

Description:

I've noticed that when uploading large PDF documents (e.g., annual reports), the system does not consistently analyze all relevant pages when extracting structured values (e.g., EBITDA, revenue, project-level financials).

It appears that:

The LLM processing often focuses only on the first few chunks or pages.

Chunks generated from later pages (even if they contain the most accurate values) are ignored or under-prioritized.

This results in missing or incorrect values, especially when the most reliable figures appear in footnotes or appendix tables.

Apr 12 '25 03:04 nkolonne

Hi @nkolonne we are processing limited chunks we are showing this information

you can experiment with chunking strategy to get more valuable results

otherwise you can try in local by removing the chunk limits by modifying the code

Apr 14 '25 05:04 kartikpersistent

@nkolonne can you provide more details and perhaps example documents and questions?

Also the types of information you're missing and what/how they should be passed to the LLM in the chat.

The KG builder by default only looks at the first 50 chunks (with 200 characters) for cost reasons.

For unlimited, you'd need a local deployment or look at the neo4j graphrag package for a processing pipeline.

Apr 17 '25 12:04 jexp

Hi Jeff Thank you for reaching out to me. Please see attached pdf. In this document we are looking for S$339.18 millions (EBITDA across all business segments) but most time processes stop at page 3-4 and give answer $ 278 .. milltions rather than getting the correct answer on page 78.
Thank you Nish Kolonne[

Annual report 2022 (1) (1).pdf

](url)

Apr 17 '25 14:04 nkolonne

the same question

Apr 18 '25 08:04 luofeng0603

llm-graph-builder llm-graph-builder copied to clipboard

Issue: LLM Processing Stops Early on Large PDFs – Misses Values from Later Pages

llm-graph-builder
llm-graph-builder copied to clipboard