arcadedb icon indicating copy to clipboard operation
arcadedb copied to clipboard

Creating a vector index after nodes already exists doesn't make vectorNeighbors return anything

Open ExtReMLapin opened this issue 1 week ago • 6 comments

Hello,

ArcadeDB version : Using yesterday's release (HEAD actually)

Following this issue (data is created using this issue) https://github.com/ArcadeData/arcadedb/issues/2908 which makes me create a vector index AFTER nodes are created :

Image

See there is some data, I created the index after the nodes were created.

Running SELECT vectorNeighbors('EmbeddingNode2[vector]', [0.0, 0.0, 0.0, 0.0], 10)

returns nothing

ArcadeDB - The Next Generation Multi-Model DBMS

vectorNeighbors('EmbeddingNode2[vector]', [0.0, 0.0, 0.0, 0.0], 10)
[]


ExtReMLapin avatar Dec 11 '25 09:12 ExtReMLapin

@ExtReMLapin please get the latest main branch, I've just pushed some fixes to the LSM vector and optimization (60% less space on disk).

I've also just added a test case similar to yours and it passes:

  @Test
  void testVectorNeighborsViaSQL() {
    database.transaction(() -> {
      // Create vertex type with vector property
      database.command("sql", "CREATE VERTEX TYPE Product IF NOT EXISTS");
      database.command("sql", "CREATE PROPERTY Product.name IF NOT EXISTS STRING");
      database.command("sql", "CREATE PROPERTY Product.embedding IF NOT EXISTS ARRAY_OF_FLOATS");

      // Create LSM vector index on embedding property
      database.command("sql", """
          CREATE INDEX IF NOT EXISTS ON Product (embedding) LSM_VECTOR
          METADATA {
            "dimensions": 128,
            "similarity": "COSINE",
            "maxConnections": 16,
            "beamWidth": 100
          }""");
    });

    // Insert test data with 128-dimensional vectors
    final int numDocs = 50;
    final List<String> productNames = new ArrayList<>();

    database.transaction(() -> {
      for (int i = 0; i < numDocs; i++) {
        final float[] embedding = new float[128];
        // Create vectors with patterns:
        // - First 10 vectors cluster around [1.0, 0.0, 0.0, ...]
        // - Next 10 vectors cluster around [0.0, 1.0, 0.0, ...]
        // - Next 10 vectors cluster around [0.0, 0.0, 1.0, ...]
        // - Remaining vectors are more random
        if (i < 10) {
          embedding[0] = 1.0f + (i * 0.1f);
          embedding[1] = 0.1f * i;
        } else if (i < 20) {
          embedding[0] = 0.1f * (i - 10);
          embedding[1] = 1.0f + ((i - 10) * 0.1f);
        } else if (i < 30) {
          embedding[0] = 0.1f * (i - 20);
          embedding[1] = 0.1f * (i - 20);
          embedding[2] = 1.0f + ((i - 20) * 0.1f);
        } else {
          // Random-ish vectors for the rest
          for (int j = 0; j < 128; j++) {
            embedding[j] = (float) Math.sin(i * j * 0.01);
          }
        }

        final String name = "Product_" + i;
        productNames.add(name);

        database.command("sql",
            "INSERT INTO Product SET name = ?, embedding = ?",
            name, embedding);
      }
    });

    System.out.println("Inserted " + numDocs + " products with 128-dimensional vectors");

    // Test 1: Find neighbors of first product (should find products 1-9 as nearest neighbors)
    database.transaction(() -> {
      final var result = database.query("sql",
          "SELECT name, vectorNeighbors('Product[embedding]', embedding, 5) as neighbors FROM Product WHERE name = 'Product_0'");

      assertThat(result.hasNext()).as("Query should return results").isTrue();
      final var doc = result.next();
      final String name = doc.getProperty("name");
      assertThat(name).as("Should get Product_0").isEqualTo("Product_0");

      // The neighbors should include other products from cluster 0-9
      System.out.println("Neighbors of Product_0: " + doc.toJSON());
    });

    // Test 2: Query using vectorNeighbors with arbitrary query vector
    database.transaction(() -> {
      // Create a query vector similar to cluster 1 (second cluster)
      final float[] queryVector = new float[128];
      queryVector[1] = 1.0f; // Similar to products 10-19

      // Use vectorNeighbors to find nearest neighbors
      final var result = database.query("sql",
          "SELECT name, vectorNeighbors('Product[embedding]', ?, 5) as neighbors FROM Product LIMIT 1",
          queryVector);

      assertThat(result.hasNext()).as("Query should return results").isTrue();
      System.out.println("VectorNeighbors result for cluster 1 query: " + result.next().toJSON());
    });

    // Test 3: Test vectorNeighbors function with different k value
    database.transaction(() -> {
      final float[] queryVector = new float[128];
      queryVector[2] = 1.0f; // Similar to products 20-29

      final var result = database.query("sql",
          "SELECT name, vectorNeighbors('Product[embedding]', ?, 10) as neighbors FROM Product LIMIT 1",
          queryVector);

      assertThat(result.hasNext()).as("Query should return results").isTrue();

      final var doc = result.next();
      System.out.println("VectorNeighbors result for cluster 2 query (k=10): " + doc.toJSON());
    });

    // Test 4: Query with specific product and find its nearest neighbors
    database.transaction(() -> {
      final var result = database.query("sql",
          "SELECT name, vectorNeighbors('Product[embedding]', embedding, 3) as neighbors " +
              "FROM Product WHERE name = 'Product_15'");

      assertThat(result.hasNext()).as("Query should return results").isTrue();
      final var doc = result.next();
      System.out.println("Neighbors of Product_15: " + doc.toJSON());

      // Product_15 should be similar to other products in the 10-19 range
      final String productName = doc.getProperty("name");
      assertThat(productName).isEqualTo("Product_15");
    });

    // Test 5: Verify multiple queries work correctly
    database.transaction(() -> {
      final float[] queryVector1 = new float[128];
      queryVector1[0] = 1.0f;

      final var result1 = database.query("sql",
          "SELECT name, vectorNeighbors('Product[embedding]', ?, 3) as neighbors FROM Product LIMIT 1",
          queryVector1);

      assertThat(result1.hasNext()).as("First query should return results").isTrue();
      System.out.println("Query 1 result: " + result1.next().toJSON());

      final float[] queryVector2 = new float[128];
      queryVector2[1] = 1.0f;

      final var result2 = database.query("sql",
          "SELECT name, vectorNeighbors('Product[embedding]', ?, 3) as neighbors FROM Product LIMIT 1",
          queryVector2);

      assertThat(result2.hasNext()).as("Second query should return results").isTrue();
      System.out.println("Query 2 result: " + result2.next().toJSON());
    });

    System.out.println("✓ All SQL vectorNeighbors tests passed!");
  }

lvca avatar Dec 11 '25 16:12 lvca

I left the office 10 mins ago, will test tomorrow !

Maybe it's related to my build ?

Everytime I update my arcadedb, i just wipe ./lib/ and replace it with the new one

ExtReMLapin avatar Dec 11 '25 16:12 ExtReMLapin

Wel shit @lvca I just tested on windows, fresh build, literally HEAD, freshly built right from the built folder, no /lib/ copy paste.

Still doesn't work, maybe it's because nodes are created using CYPHER and/or before index is created ?

ExtReMLapin avatar Dec 11 '25 17:12 ExtReMLapin

Image

ExtReMLapin avatar Dec 11 '25 17:12 ExtReMLapin

Is there a way you can upload this database as a zip? Or a test case to reproduce the same content?

lvca avatar Dec 11 '25 17:12 lvca

Sure !

databases.zip

ExtReMLapin avatar Dec 11 '25 18:12 ExtReMLapin