pg_vectorize icon indicating copy to clipboard operation
pg_vectorize copied to clipboard

chunking support to `vectorize.table()`

Open ChuckHend opened this issue 1 year ago • 0 comments

Provide ability to automatically chunk text in the input columns to the vectorize.table function, or provide a utility function (vectorize.chunk_table()?) that takes an input table, chunks the data in each row to multiple rows, and puts the output into a new table. I suppose vectorize.table could call vectorize.chunk_table under the hood as a convenience.

Use case is when there are giant documents, then user might want be able to retrieve just a subset of that document. Retrieving a subset of the document means that the chunk would hopefully be more relevant and specific than the entire document.

Langchain’s recursive_text_splitter for an example of this: https://python.langchain.com/docs/how_to/recursive_text_splitter/

ChuckHend avatar Oct 11 '24 01:10 ChuckHend