pinecone-ts-client icon indicating copy to clipboard operation
pinecone-ts-client copied to clipboard

Audrey/add recency decay

Open aulorbe opened this issue 1 year ago • 0 comments

Problem

This is a (quality of life) QoL feature I have proactively built into our upcoming RC, on the client side. We currently have no utilities that empower customers to customize their /rerank results downstream of the API, which I'm anticipating will cause friction with heavy users of /rerank.

Solution

Reranking results by recency is a super popular, common, and useful feature for all of our clients to have. This PR can serve as the POC for this feature's integration into our other clients.

Overview

  • A developer hits the /rerank endpoint and gets results back
  • This same dev wants to tweak the endpoint's results before sending them to their application's user -- this dev wants the results to have some notion of "recency"

Use cases

  • Let's say you're an eCommerce developer and you want to boost the ranking of results based on seasonality. You can make sure that a search for "best football tshirts" returns not only the most relevant results, but also the NEWEST results to your shop (that you just uploaded b/c it's Fall). You can crank up the recency decay to make sure older items are pushed farther down the results list compared to newer items, even if both items are equally relevant to the query

Toggles available to the user (passed in via options ):

  • decay true/false

  • decayWeight (default 0.5): The magnitude of the decay's impact on document scores.

    • Increasing this value:
      • Effect: Decay has a stronger impact on document scores; older docs are heavily penalized.
      • Use case: You want to more strongly prioritize recency.
    • Decreasing this value:
      • Effect: Decay has a weaker impact on document scores; older documents have a better chance at retaining their original score/ranking.
      • Use case: You want to prioritize recency less.
  • decayThreshold (default 30 days): Time period (in days) after which the decay starts significantly affecting. If a document is within the threshold, the decay will scale based on how old the document is. If it is older than the threshold, the document is treated as fully decayed (normalized decay of 1).

    • Increasing this value:
      • Effect: Recency decay is more gradual; documents remain relevant for a longer time.
      • Use case: When freshness/recency is less important (e.g. product reviews)
    • Decreasing this value:
      • Effect: Recency decay is more abrupt; documents lose relevance faster.
      • Use case: When freshness/recency is more important (e.g. news articles).

Future toggles

It'd be ideal in the future to allow the user to choose between common recency decay functions other than additive, e.g. multiplicative, exponential, and log.

User requirements

The user must pass in documents that contain a timestamp field that is a stringified timestamp, to the millisecond, e.g. "2010-08-10 00:03:21"

Type of Change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] This change requires a documentation update
  • [ ] Infrastructure change (CI configs, etc)
  • [ ] Non-code change (docs, etc)
  • [ ] None of the above: (explain here)

Test Plan

New integration and unit tests pass.

Still need to do:

  • README updates
  • Example in /rerank docstring of how to pass in options containing recency stuff

Further reading:

  • https://opensearch.org/docs/latest/query-dsl/compound/function-score/#decay-functions

  • To see the specific tasks where the Asana app for GitHub is being used, see below:
    • https://app.asana.com/0/0/1208439494339482

aulorbe avatar Sep 30 '24 21:09 aulorbe