danswer icon indicating copy to clipboard operation
danswer copied to clipboard

[FEATURE REQUEST]Document ingestion/chunking settings

Open dukemagus opened this issue 11 months ago • 4 comments

Please add an "advanced settings" section on the document upload/ingestion page exposing the chunking options before tokenizing.

More important among them

Chunk size (in characters) New models with better indexing capabilities are appearing and it's very possible we'll get some upgrade on ada 002 witha higher token limit.

Chunk overlap: Having a small overlapping text between the end of one chunk and the start of the other improves vector DB search results.

Metadata edit: columns per chunk: chunk number, document page, document name and, if possible, an optional field to add a some additional info (alternative document address, for example).

dukemagus avatar Jul 12 '23 18:07 dukemagus