sql
sql copied to clipboard
[FEATURE] Add K-NN vector and query support to SQL Plugin
Is your feature request related to a problem?
The k-nn
, or k-nearest neighbours
plugin enables users to define knn_vector
objects and perform nearest-neighbour queries on that objects in OpenSearch.
What solution would you like?
We want to add the following features:
-
knn_vector
type to the list of supported OS-SQL types - Add
Approximate k-NN
to the list of supported OpenSearch query functions - Add
Script Score k-NN
to the list of supported OpenSearch scripted functions, using thescript
field. - Add
k-NN’s Painless extensions
as part of the script score k-nn
Questions:
-
TODO
: determine if we can use theScriptScoreFunction
provided by OpenSearch. - Determine which syntax 'makes sense' for each language. It seems simpler to use the
where
clause in PPL (with multiple `where clauses) to denote pre- and post- filter processing of the knn function. This makes less sense in SQL - and we may consider using the window functions or ordering the WHERE clause on purpose.
Proposed PPL syntax
-
knn
andknn_score
falls into thewhere
command. We can chain multiplewhere
commands for pre-filter and post-filter processing.
where <pre-filter expression> | where knn_score(field, vector, k) | where <post-filter expression> | top size
Proposed SQL Syntax:
Option 1) use OpenSearch functions to call knn in the ORDER BY clause.
WHERE <pre-filter expression> ORDER BY knn(field, vector, k, <post-filter expression>) LIMIT size
- Approximate knn:
ORDER BY knn(field, vector, k)
- Approximate knn with pre-filter query:
WHERE expression ORDER BY knn(field, vector, k)
- Approximate knn with post-filter query:
ORDER BY knn(field, vector, k, expression)
WHERE <pre-filter expression> AND knn_score(field, vector, k) AND <post-filter expression> LIMIT size
Option 2) USe pgvector syntax to map to OpenSearch DSL functions
see: pgvector
Do you have any additional context?
reference:
- https://opensearch.org/docs/latest/search-plugins/knn/index/
- https://github.com/opensearch-project/sql/issues/1412
Proposed Schedule
- [ ] Add knn_vector datatype to OS-SQL and OS-PPL
- [ ] Add exact knn_score to PPL language
- [ ] Add approximate knn to PPL language with filtering
- [ ] Port syntax to SQL
- [ ] Add pgvector syntax to SQL