sql icon indicating copy to clipboard operation
sql copied to clipboard

[FEATURE] Add K-NN vector and query support to SQL Plugin

Open acarbonetto opened this issue 1 year ago • 7 comments

Is your feature request related to a problem?

The k-nn, or k-nearest neighbours plugin enables users to define knn_vector objects and perform nearest-neighbour queries on that objects in OpenSearch.

What solution would you like?

We want to add the following features:

  • knn_vector type to the list of supported OS-SQL types
  • Add Approximate k-NN to the list of supported OpenSearch query functions
  • Add Script Score k-NN to the list of supported OpenSearch scripted functions, using the script field.
  • Add k-NN’s Painless extensions as part of the script score k-nn

Questions:

  • TODO: determine if we can use the ScriptScoreFunction provided by OpenSearch.
  • Determine which syntax 'makes sense' for each language. It seems simpler to use the where clause in PPL (with multiple `where clauses) to denote pre- and post- filter processing of the knn function. This makes less sense in SQL - and we may consider using the window functions or ordering the WHERE clause on purpose.

Proposed PPL syntax

  • knn and knn_score falls into the where command. We can chain multiple where commands for pre-filter and post-filter processing.
where <pre-filter expression> | where knn_score(field, vector, k) | where <post-filter expression> | top size

Proposed SQL Syntax:

Option 1) use OpenSearch functions to call knn in the ORDER BY clause.

WHERE <pre-filter expression> ORDER BY knn(field, vector, k, <post-filter expression>) LIMIT size
  • Approximate knn: ORDER BY knn(field, vector, k)
  • Approximate knn with pre-filter query: WHERE expression ORDER BY knn(field, vector, k)
  • Approximate knn with post-filter query: ORDER BY knn(field, vector, k, expression)
WHERE <pre-filter expression> AND knn_score(field, vector, k) AND <post-filter expression> LIMIT size

Option 2) USe pgvector syntax to map to OpenSearch DSL functions

see: pgvector

Do you have any additional context?

reference:

  • https://opensearch.org/docs/latest/search-plugins/knn/index/
  • https://github.com/opensearch-project/sql/issues/1412

Proposed Schedule

  • [ ] Add knn_vector datatype to OS-SQL and OS-PPL
  • [ ] Add exact knn_score to PPL language
  • [ ] Add approximate knn to PPL language with filtering
  • [ ] Port syntax to SQL
  • [ ] Add pgvector syntax to SQL

acarbonetto avatar May 08 '23 21:05 acarbonetto