langchainrb icon indicating copy to clipboard operation
langchainrb copied to clipboard

Ability to specify the distance threshold when calling similarity_search

Open andreibondarev opened this issue 1 year ago • 1 comments

Description

In Discord it was asked whether we can specify a distance threshold when calling the Vectorsearch#ask method. The need is to return ALL record based on their relevance score as opposed to returning a static number of k: record.

Tasks

  • Explore whether vectorsearch DBs support a distance threshold parameter. If yes -- we should implement it. If no -- we should not because then it could be done on the client side.
  • Modify vectorsearch#ask(), vectorsearch#similarity_search_by_vector() and vectorsearch#similarity_search() methods to accept distance_gte: ("distance greater than or equal") parameter to set this threshold.

Note: We might need to normalize/standardize the distance scores that various vectorsearch engines return.

andreibondarev avatar Apr 25 '24 13:04 andreibondarev

I believe this is the same issue https://github.com/patterns-ai-core/langchainrb/issues/249 Will probably close the earlier one since this one has a better description.

sergiobayona avatar Apr 14 '25 20:04 sergiobayona