langextract icon indicating copy to clipboard operation
langextract copied to clipboard

Question: Does Langextract work better on full documents or chunks?

Open voycey opened this issue 4 months ago • 2 comments

I have quite a lot of Legal type documents (Enterprise / Union agreements for example). In other systems I chunk the files down to make them more searchable.

I appreciate that Langextract is a different beast - so my question is whether it would work better on chunks of documents or on the entire document as a whole?

voycey avatar Aug 28 '25 11:08 voycey

I think in pre-processing steps, LX chunks the doc so I guess it won't matter

AliHaider20 avatar Aug 30 '25 06:08 AliHaider20

From my current tests it seems to work better when using docling to chunk it (at least in terms of building a knowledgegraph).

I guess its due to contexual / visual chunking vs naive chunking.

voycey avatar Aug 30 '25 06:08 voycey