langextract
langextract copied to clipboard
Question: Does Langextract work better on full documents or chunks?
I have quite a lot of Legal type documents (Enterprise / Union agreements for example). In other systems I chunk the files down to make them more searchable.
I appreciate that Langextract is a different beast - so my question is whether it would work better on chunks of documents or on the entire document as a whole?
I think in pre-processing steps, LX chunks the doc so I guess it won't matter
From my current tests it seems to work better when using docling to chunk it (at least in terms of building a knowledgegraph).
I guess its due to contexual / visual chunking vs naive chunking.