langchain Add top K as Self Query Retriever Get Documents Input

Add Top K as Optional Parameter for Self Query Retriever

Adding k as a parameter to Self-Retriever's Get Relevant Documents method. This allows users to specify how many documents they want it to retrieve during a search. This is a fixed PR from previous PR: https://github.com/hwchase17/langchain/pull/4297

Who can review?

Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:

May 09 '23 01:05 blc16

@hwchase17 @dev2049

Two questions to figure out here:

We don't have benchmarks anywhere. Adding another parameter makes the task more complex which means that performance will likely decrease. It might be by a tiny amount, but we don't know.
(Less important) the existing parser can be generalized right now to yield a single "program" rather than be broken into a query, k, and a program for filtering. It'll clean up some of the code, but same bench marking problem. This is less of an issue since the implementation can be refactored at later time.

May 09 '23 02:05 eyurtsev

@hwchase17 @dev2049

Two questions to figure out here:

We don't have benchmarks anywhere. Adding another parameter makes the task more complex which means that performance will likely decrease. It might be by a tiny amount, but we don't know.

(Less important) the existing parser can be generalized right now to yield a single "program" rather than be broken into a query, k, and a program for filtering. It'll clean up some of the code, but same bench marking problem. This is less of an issue since the implementation can be refactored at later time.

Are there existing benchmarks for other retrievers? Do you want me to look into adding one?

May 09 '23 21:05 blc16

@eyurtsev @hwchase17 any feedback for this? It would be really helpful to add this parameter so I can use Self-Retriever in my application

May 15 '23 02:05 blc16

@hwchase17 @dev2049

Two questions to figure out here:

We don't have benchmarks anywhere. Adding another parameter makes the task more complex which means that performance will likely decrease. It might be by a tiny amount, but we don't know.

(Less important) the existing parser can be generalized right now to yield a single "program" rather than be broken into a query, k, and a program for filtering. It'll clean up some of the code, but same bench marking problem. This is less of an issue since the implementation can be refactored at later time.

i think its actually beneficial to have k separate - otherwise it will get bundled up with other parameters. unless we want to do some complicated decoding....

May 15 '23 03:05 hwchase17

langchain langchain copied to clipboard

Add top K as Self Query Retriever Get Documents Input

Who can review?

langchain
langchain copied to clipboard