llm-graph-builder
llm-graph-builder copied to clipboard
Issue: LLM Processing Stops Early on Large PDFs – Misses Values from Later Pages
Description:
I've noticed that when uploading large PDF documents (e.g., annual reports), the system does not consistently analyze all relevant pages when extracting structured values (e.g., EBITDA, revenue, project-level financials).
It appears that:
The LLM processing often focuses only on the first few chunks or pages.
Chunks generated from later pages (even if they contain the most accurate values) are ignored or under-prioritized.
This results in missing or incorrect values, especially when the most reliable figures appear in footnotes or appendix tables.
Hi @nkolonne we are processing limited chunks we are showing this information
you can experiment with chunking strategy to get more valuable results
otherwise you can try in local by removing the chunk limits by modifying the code
@nkolonne can you provide more details and perhaps example documents and questions?
Also the types of information you're missing and what/how they should be passed to the LLM in the chat.
The KG builder by default only looks at the first 50 chunks (with 200 characters) for cost reasons.
For unlimited, you'd need a local deployment or look at the neo4j graphrag package for a processing pipeline.
Hi Jeff
Thank you for reaching out to me.
Please see attached pdf. In this document we are looking for S$339.18 millions (EBITDA across all business segments) but most time processes stop at page 3-4 and give answer $ 278 .. milltions rather than getting the correct answer on page 78.
Thank you
Nish Kolonne[
Annual report 2022 (1) (1).pdf
](url)
the same question