llm-graph-builder icon indicating copy to clipboard operation
llm-graph-builder copied to clipboard

Guidance on Optimal Chunking Configuration for LLM-Based Processing of Financial PDFs

Open igelfenbeyn opened this issue 3 months ago • 0 comments

Hello,

I’m working on processing a large number of loosely related PDF files—primarily financial statements such as balance sheets, income statements, and similar documents. In this project, I’m not defining a fixed ontology upfront; instead, I’m relying on the LLM to determine how to interpret and extract information from each document.

Given this use case, I’d like to know: What are the most optimal chunking configurations for this kind of unstructured, heterogeneous input?

Additionally, is there any documentation or best-practice guide that explains the trade-offs between using larger vs. smaller chunk sizes? I’m particularly interested in how chunk size impacts context retention, accuracy of entity/relation extraction, and overall performance when using LLMs for knowledge graph construction.

Any advice or references would be greatly appreciated!

Thanks in advance.

igelfenbeyn avatar Aug 18 '25 21:08 igelfenbeyn