SQL support for HNSW index
ArcadeDB's HNSW indes is pretty powerful, but the lack of SQL support makes it hard to use via API.
We need some new function/method to expose the following methods from the index:
findNeighborsFromVector(TVector vector, int k): find max K neighbors from a vector of embeddingsfindNeighborsFromId(TID id, int k): find max K neighbors starting from an id (indexed with the underlying LSMTree)findNeighborsFromVertex(Vertex start, int k): find max K neighbors starting from a vertex
The easiest way is to create 3 new SQL functions to be used from SQL. Example:
select findNeighborsFromVector( "Word[name,vector]", [1,2,3,4,5,6], 10 )
The Java API returns a List<Pair<Identifiable, ? extends Number>>, with the vertex rid as the first argument and a number (float, double or whatever you pick at index creation) with the proximity. Ordered by proximity, the closest first.
With SQL it must be wrapped in a Result with "vertex" and "proximity" properties:
+------------------+---------------------+
| VERTEX | PROXIMITY |
+------------------+---------------------+
| #13:4 | 0.12 |
| #19:10 | 0.19 |
+------------------+---------------------+
So you can also cross the graph starting with embeddings:
select expand( vertex ) from (
select findNeighborsFromVector( "Word[name,vector]", [1,2,3,4,5,6], 10 )
) where proximity < 0.5
To return all the neighbors with proximity less than 0.5 from the vector.
@gramian pointed out we already have the SQL vectorNeighbors() function: https://docs.arcadedb.com/#_vectorneighbors. This is only to use the index when you have an ID, but not if you have a vector of embeddings.