[RFC] Create a new/redesign the existing k-nn API
Is your feature request related to a problem?
I've tried using various databases (Pinecone, Vespa, Weaviate, and Qdrant, and of course OpenSearch) for rudimentary vector search with a pure Python HTTP client (see blog post and accompanying code).
I found the OpenSearch API to be one of the easiest to use, but not the easiest, and felt like k-nn is not a 1st class citizen in OpenSearch (after all it's a plugin). Some examples of things I disliked as a new user are 1) needing to create a mapping that is knn=True, 2) having to nest k-nn queries under knn, 3) inconsistencies with the rest of OpenSearch with features such as post_filter, 4) the need for keywords such as bool and must that make the query extra verbose, and 5) required mental overhead of what features work and don't work depending on the algorithm or knn implementation (Lucene, nmslib, faiss) chosen.
Are these real problems? Do you know of other problems? Looking for your feedback!
What solution would you like?
Design a brand new API that is not backwards compatible with the current k-nn API.
What alternatives have you considered?
https://github.com/opensearch-project/k-NN/issues/969: Simplify developer onboarding to k-nn (vector search)
Do you have any additional context?
A new API has some pros and cons.
Pros:
- Different from OpenSearch.
- Vector-first, ie. traditional search support secondary or none at all needed.
- Easier to get started in vector-only scenarios.
Cons:
- Not enough evidence that the problems described above are significant enough for users.
- Two APIs aren't aligned with one OpenSearch product long term.
- Duplicates functionality with an existing, stable and proven, k-NN API.
- Not backwards or forward compatible with existing k-NN API.
- Will cause users to have to migrate from simple to advanced API to use advanced features.
- Will require addition to all existing clients.