k-NN icon indicating copy to clipboard operation
k-NN copied to clipboard

[BUG] Approximate k-NN doesn't always return all results when there are multiple segments

Open SkyTrix opened this issue 1 year ago • 17 comments

What is the bug? When indexing a number of documents and executing an approximate k-NN search on them, not all matches are always being returned. Often times switches between two sets of results, one of them often having the correct number of results (depending on whether it hits primary or replica?). The issue disappears after force merging to one segment.

How can one reproduce the bug? Have an index with a knn_vector field and index some documents, leaving about a minute between each index operation. Usually reproduced after indexing about 8-16 documents, this time already saw it go wrong with 4 documents.

"ec_knn512@fingerPrint" : {
   "type" : "knn_vector",
    "dimension" : 512
 }

"settings" : {
  "index" : {
    "number_of_shards" : "1",
    "knn" : "true",
    "number_of_replicas" : "1"
  }
}

Given the large knn vector I need to provide the data in gists as the issue body gets too large. The 4 indexed documents and search query: https://gist.github.com/SkyTrix/392e127a46f255392bd57f96bfb72bfe

First (wrong) result option
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.046935834,
    "hits": [
      {
        "_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
        "_id": "4mLnEciUPVzqk7-3aUUUKjK12",
        "_score": 0.046935834,
        "_source": {
          "name": "Fourth FIle.cf2"
        }
      },
      {
        "_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
        "_id": "4mLnEciUPVzqk7-3aUUPXBQyx",
        "_score": 0.041627396,
        "_source": {
          "name": "First File.MFG"
        }
      }
    ]
  }
}
Second (wrong) result option
{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 0.05267875,
    "hits": [
      {
        "_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
        "_id": "4mLnEciUPVzqk7-3aUU8Qb4nA",
        "_score": 0.05267875,
        "_source": {
          "name": "Eighth File.ds2"
        }
      },
      {
        "_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
        "_id": "4mLnEciUPVzqk7-3aUUUKjK12",
        "_score": 0.046935834,
        "_source": {
          "name": "Fourth FIle.cf2"
        }
      },
      {
        "_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
        "_id": "4mLnEciUPVzqk7-3aUUPXBQyx",
        "_score": 0.041627396,
        "_source": {
          "name": "First File.MFG"
        }
      }
    ]
  }
}
_segments output for the index
{
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "indices": {
    "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c": {
      "shards": {
        "0": [
          {
            "routing": {
              "state": "STARTED",
              "primary": true,
              "node": "t1sWlB6tR7OVKu18Z0-D7Q"
            },
            "num_committed_segments": 1,
            "num_search_segments": 9,
            "segments": {
              "_9d": {
                "generation": 337,
                "num_docs": 64,
                "deleted_docs": 0,
                "size_in_bytes": 322787,
                "memory_in_bytes": 0,
                "committed": true,
                "search": false,
                "version": "9.5.0",
                "compound": false,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_9o": {
                "generation": 348,
                "num_docs": 54,
                "deleted_docs": 11,
                "size_in_bytes": 346570,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": false,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_as": {
                "generation": 388,
                "num_docs": 3,
                "deleted_docs": 67,
                "size_in_bytes": 345019,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": false,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_at": {
                "generation": 389,
                "num_docs": 0,
                "deleted_docs": 1,
                "size_in_bytes": 34046,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_au": {
                "generation": 390,
                "num_docs": 1,
                "deleted_docs": 0,
                "size_in_bytes": 55172,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_av": {
                "generation": 391,
                "num_docs": 1,
                "deleted_docs": 1,
                "size_in_bytes": 32585,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_aw": {
                "generation": 392,
                "num_docs": 0,
                "deleted_docs": 1,
                "size_in_bytes": 31846,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_ax": {
                "generation": 393,
                "num_docs": 0,
                "deleted_docs": 1,
                "size_in_bytes": 32004,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_ay": {
                "generation": 394,
                "num_docs": 0,
                "deleted_docs": 1,
                "size_in_bytes": 57060,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_az": {
                "generation": 395,
                "num_docs": 1,
                "deleted_docs": 0,
                "size_in_bytes": 49338,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              }
            }
          },
          {
            "routing": {
              "state": "STARTED",
              "primary": false,
              "node": "PHzv1yuSRnuzFUiSKkt_cg"
            },
            "num_committed_segments": 1,
            "num_search_segments": 10,
            "segments": {
              "_9b": {
                "generation": 335,
                "num_docs": 64,
                "deleted_docs": 0,
                "size_in_bytes": 323673,
                "memory_in_bytes": 0,
                "committed": true,
                "search": false,
                "version": "9.5.0",
                "compound": false,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_9m": {
                "generation": 346,
                "num_docs": 54,
                "deleted_docs": 20,
                "size_in_bytes": 408172,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": false,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_ag": {
                "generation": 376,
                "num_docs": 3,
                "deleted_docs": 57,
                "size_in_bytes": 278440,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": false,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_ah": {
                "generation": 377,
                "num_docs": 0,
                "deleted_docs": 1,
                "size_in_bytes": 33885,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_ai": {
                "generation": 378,
                "num_docs": 0,
                "deleted_docs": 1,
                "size_in_bytes": 34042,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_aj": {
                "generation": 379,
                "num_docs": 1,
                "deleted_docs": 0,
                "size_in_bytes": 55173,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_ak": {
                "generation": 380,
                "num_docs": 1,
                "deleted_docs": 1,
                "size_in_bytes": 32588,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_al": {
                "generation": 381,
                "num_docs": 0,
                "deleted_docs": 1,
                "size_in_bytes": 31849,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_am": {
                "generation": 382,
                "num_docs": 0,
                "deleted_docs": 1,
                "size_in_bytes": 32006,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_an": {
                "generation": 383,
                "num_docs": 0,
                "deleted_docs": 1,
                "size_in_bytes": 57065,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              },
              "_ao": {
                "generation": 384,
                "num_docs": 1,
                "deleted_docs": 0,
                "size_in_bytes": 49340,
                "memory_in_bytes": 0,
                "committed": false,
                "search": true,
                "version": "9.5.0",
                "compound": true,
                "attributes": {
                  "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                }
              }
            }
          }
        ]
      }
    }
  }
}

What is the expected behavior? I expect to always get the correct number of hits, in the example above, it should always return the 4 documents.

What is your host/environment?

  • Amazon OpenSearch Service
  • Reproduced on ES 7.9 & 7.10 and OpenSearch 1.3 and 2.7. For corresponding k-NN plugin versions see https://docs.aws.amazon.com/opensearch-service/latest/developerguide/knn.html

Do you have any additional context? What is also remarkable, when it goes wrong, is that hits.total mostly returns the wrong number of total hits. When increasing k and size in the query, the hits.total goes up. Not sure whether hits.total (not the number of returned hits) is always expected to be the full total in the index for k-NN? After force merging to 1 segment, the hits.total is always correct, no matter the k and size values. Again, after force merging the correct number of actual hits is also always correct, but should also be correct when there are more/many segments.

SkyTrix avatar Jul 25 '23 09:07 SkyTrix

Thanks for reporting this @SkyTrix . I will take a look.

jmazanec15 avatar Aug 01 '23 21:08 jmazanec15

Just curious, @SkyTrix were there any deleted documents when you were doing the query?

navneet1v avatar Aug 09 '23 00:08 navneet1v

@jmazanec15 do we have any RCA on why the problem was happening?

navneet1v avatar Aug 14 '23 23:08 navneet1v

@SkyTrix I think this has something to do with having few documents in segments. What was the k/size you tried for 16 docs? I havent been able to repro the issue quite yet

jmazanec15 avatar Aug 16 '23 22:08 jmazanec15

Just curious, @SkyTrix were there any deleted documents when you were doing the query?

Can't tell by heart, but I'm pretty sure there were. We use OpenSearch as a read replica of a db, so every change to a certain record there results in a new document with the same _id in the index, so also a deleted document.

SkyTrix avatar Aug 21 '23 11:08 SkyTrix

@SkyTrix I think this has something to do with having few documents in segments. What was the k/size you tried for 16 docs? I havent been able to repro the issue quite yet

@jmazanec15 k and size always matching, I believe when putting it on 16 it also didn't return all 16 results, but when putting it on 30 for example, it would find all 16 docs. I'm returning to work early next week so can try to reproduce again then. Anything in particular you want me to pay attention to/provide here? I'm also happy to live reproduce over a Teams call if you prefer that.

SkyTrix avatar Aug 21 '23 11:08 SkyTrix

Thanks @SkyTrix.

Could you provide a script to reproduce the issue? I tried with the following but wasnt able to reproduce it locally:

Repro python attempt
import json
from opensearchpy import OpenSearch, RequestsHttpConnection
import time
import random

index_name = "test-index"
field_name = "test-field"
dim = 512
num_docs = 16
k = num_docs
size = num_docs


def action(doc_id):
    return {'index': {'_index': index_name, '_id': doc_id}}

def bulk_transform(vec, field_name: str, action, id: int):
    return [action(id), { field_name: vec }]


os = OpenSearch(
    hosts=[{'host': "localhost", 'port': 9200}],
    use_ssl=False,
    verify_certs=False,
    connection_class=RequestsHttpConnection,
    pool_maxsize=20,
    timeout=100
)

ind_body = {
    "settings" : {
        "index" : {
            "number_of_shards" : "1",
            "knn" : "true",
            "number_of_replicas" : "1"
        }
    },
    "mappings": {
        "properties": {
            field_name : {
                "type" : "knn_vector",
                "dimension" : dim
            }
        }
    }
}

#random.seed(10)

# os.indices.delete(index=index_name, ignore=[400, 404])
# os.indices.create(index=index_name, body=ind_body)


vecs = [[random.random() for _ in range(dim)] for _ in range(num_docs)]

# for i, vec in enumerate(vecs):
#     body = bulk_transform(vec, field_name, action, i)
#     os.bulk(index=index_name, body=body)
#     print("Sleeping")
#     if i < len(vecs) - 1:
#         time.sleep(60)
#     else:
#         time.sleep(1)

q_body = {
    'size': size,
    'query': {
        'knn': {
            field_name: {
                'vector': [random.random() for _ in range(dim)],
                'k': k
            }
        }
    }
}

query_response = os.search(index=index_name, body=q_body, _source_excludes=[field_name])
print(json.dumps(query_response, indent=4))

Additionally, could you provide the segment count when you are able to reproduce it via /_cat/segments?

jmazanec15 avatar Aug 22 '23 20:08 jmazanec15

I managed to reproduce again with only 4 documents. The knn-search with k and size set to 4 only returns 3 documents. When setting to 5, it returns all 4 of them. If it matters, the cluster uses dedicated master nodes and has 3 data nodes.

This is the output of _cat/segments for the specific index:

ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _hy 646  0  71 282.7kb     0 true false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ii 666 52 108   314kb 29764 true true  8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ij 667  0   1  30.9kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ik 668  1   0  42.6kb 10388 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _il 669  1   1  29.5kb  8684 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _im 670  0   1  29.2kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _in 671  0   1  29.4kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _io 672  1   0    37kb  8756 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _i6 654 49  58 285.3kb 30204 true true  8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _iq 674  3 122 316.8kb 16340 true true  8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _ir 675  1   0  42.6kb 10388 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _is 676  1   1  29.5kb  8684 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _it 677  0   1  29.2kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _iu 678  0   1  29.4kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _iv 679  1   0    37kb  8756 true true  8.6.2 true

And a second time, again with 4 documents, where it only returns 2 results with k & size = 4 or 5. 3 results with k & size = 6 or 7, and eventually all 4 results with k & size = 8

ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _hy 646  0  71 282.7kb     0 true  false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ii 666 52 108   314kb     0 true  false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ij 667  0   1  30.9kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ik 668  1   0  42.6kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _il 669  1   1  29.5kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _im 670  0   1  29.2kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _in 671  0   1  29.4kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _io 672  1   0    37kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _iq 674 49   6 203.6kb 26620 false true  8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _jv 715  6  57 172.5kb 12444 false true  8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _i6 654 49  58 285.3kb     0 true  false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _iq 674  3 122 316.8kb     0 true  false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _ir 675  1   0  42.6kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _is 676  1   1  29.5kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _it 677  0   1  29.2kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _iu 678  0   1  29.4kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _iv 679  1   0    37kb     0 true  false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _ix 681 49   6 206.5kb 26772 false true  8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _k2 722  6  57 172.5kb 12444 false true  8.6.2 false

I'll have a look at trying to reproduce with a script.

SkyTrix avatar Sep 04 '23 09:09 SkyTrix

@jmazanec15 I had a lot of trouble trying to reproduce it with a simple script. It did however reproduce easily through some of our application code. What I ended up doing is capturing the calls to _bulk and writing that to a file together with a timestamp. I then created a script to replay those calls at the same rate of the original calls.

The provided script is Scala instead of Python, but should be very straightforward to port if you need. It simply reads the reproducer.txt file, finds the bulk payloads between BATCH_START_TIME <timestamp> lines and executes them with the appropriate time in between.

I've run the reproducer script multiple times on both ES 7.9 and OpenSearch 2.11 and every time the problem was reproduced.

Steps to reproduce: https://gist.github.com/SkyTrix/4abe38b9c7d5dee51e44d951c6bded03

  1. Apply the mapping template from mapping_template.txt
  2. Configure ES endpoint in IndexingReplay.scala and run it (it runs for 5 minutes)
  3. Run the query from query.txt (multiple times)
  4. Notice that less than 14 results are returned, sometimes jumping between two different values
  5. Update k and size to 30 and rerun query, see that now 14 results are returned
  6. Forcemerge the index and rerun the original query, the 14 results are returned

I hope this will allow you to reproduce the problem on your system.

SkyTrix avatar Dec 04 '23 15:12 SkyTrix

@jmazanec15 @navneet1v Would you happen to have a chance to try out the reproducer? Thanks!

SkyTrix avatar Feb 06 '24 14:02 SkyTrix

@SkyTrix sure, will take look this week and try

jmazanec15 avatar Feb 06 '24 23:02 jmazanec15

@SkyTrix From

ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _hy 646  0  71 282.7kb     0 true false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ii 666 52 108   314kb 29764 true true  8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ij 667  0   1  30.9kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ik 668  1   0  42.6kb 10388 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _il 669  1   1  29.5kb  8684 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _im 670  0   1  29.2kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _in 671  0   1  29.4kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _io 672  1   0    37kb  8756 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _i6 654 49  58 285.3kb 30204 true true  8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _iq 674  3 122 316.8kb 16340 true true  8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _ir 675  1   0  42.6kb 10388 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _is 676  1   1  29.5kb  8684 true true  8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _it 677  0   1  29.2kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _iu 678  0   1  29.4kb     0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x  _iv 679  1   0    37kb  8756 true true  8.6.2 true

I see a very high number of deleted documents. I believe headers correspond to index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound

For instance, in ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 I see that there are 3 docs and 122 that are deleted. How native index search will work is that those 122 documents will be still in the graph and on search, they may get returned from the graph and then filtered out because they are deleted. So, if one of the expected results is in this segment, it may not get returned because the k results returned will be from the 122 deleted results. Increasing k may lead to one of those 3 from getting returned.

Further, in the example above, there are 110 live docs to 266 deleted docs. So 70% of the vectors in the graph correspond to deleted documents.

For this reason, I recommend ensuring that the ratio of deleted docs in the index is no more than 10%. There is a forcemerge option where you can run with expunge deletes to ensure documents get deleted.

Lastly, can you run _cat/indices?v and paste the result to see how many deleted documents you have in your index?

jmazanec15 avatar Feb 07 '24 20:02 jmazanec15

@jmazanec15 Thanks for looking into this.

We use OpenSearch as a read replica of a database, so every change to the database record will result in a new version of the document and thus an extra deleted document. So in our case, having a lot of deleted documents is "normal". Note that these new versions always have the same _id as the previous version of the document, so I'm a bit surprised these old versions are still "in play". Is there any way only the last versions per _id can get considered in the search?

I ran the reproducer again on an empty index. It eventually produces 16 documents of which 14 should be found by k-NN.

GET _cat/indices/ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2?v

health status index                                                        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 acTvS6-4QhiLKWSscL2DgA   1   1         16           87    613.5kb        613.5kb
GET _cat/segments/ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2?v

index                                                        shard prirep ip            segment generation docs.count docs.deleted    size size.memory committed searchable version compound
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _a              10         15           76 555.2kb           0 false     true       9.7.0   true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _b              11          1           11    58kb           0 false     true       9.7.0   true

The query returned only 10 documents with k and n = 14.

Checking again 30 minutes later, the segments were automatically merged and deleted docs were gone and all 14 results were returned.

GET _cat/indices/ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2?v

health status index                                                        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 acTvS6-4QhiLKWSscL2DgA   1   1         16            0    345.3kb        345.3kb
GET _cat/segments/ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2?v

index                                                        shard prirep ip            segment generation docs.count docs.deleted  size size.memory committed searchable version compound
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _c              12         16            0 345kb           0 true      true       9.7.0   false

I thought, the more segments, the higher the chance to run into the issue, but here's a run that does always return the 14 documents:

index                                                        shard prirep ip            segment generation docs.count docs.deleted    size size.memory committed searchable version compound
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _i              18          7           39 274.7kb           0 false     true       9.7.0   true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _v              31          4           29 181.6kb           0 false     true       9.7.0   false
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _w              32          0            1  31.3kb           0 false     true       9.7.0   true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _x              33          1            0  55.4kb           0 false     true       9.7.0   true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _y              34          0            2  28.1kb           0 false     true       9.7.0   true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _z              35          0            1  28.9kb           0 false     true       9.7.0   true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _10             36          0            1  29.6kb           0 false     true       9.7.0   true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0     p      x.x.x.x _11             37          4           14   192kb           0 false     true       9.7.0   true

SkyTrix avatar Feb 09 '24 11:02 SkyTrix

So in our case, having a lot of deleted documents is "normal".

Yes, typically with ANN search this will be a problem. For the HNSW graphs in old segments, the docs get marked as deleted, but they are not removed from the graph - main reason being that it is pretty difficult to efficiently remove elements from the graphs.

One thing you could do is use expunge deletes more aggressively so segments typically will have a strong majority of live docs.

Alternatively, lucene passes the non-live docs as a filter to the HNSW search so they get skipped (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java#L236-L238), so switching to lucene engine may yield better results for you case as well.

jmazanec15 avatar Feb 09 '24 20:02 jmazanec15

One thing you could do is use expunge deletes more aggressively so segments typically will have a strong majority of live docs.

The documentation warns that _forcemerge should not be used on indices that are still actively receiving writes, so I'm not sure I'd want to go there..? https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html#forcemerge-api-desc

Thanks for the tip, I'll have to see what the impact is of switching to the Lucene engine. Out of interest: given that both nmslib and Lucene use HNSW graphs, can't nmslib then also pass the non-live docs as a filter?

Edit: Sadly the 1024 dimensions limit is too small for us, so it doesn't look like we can use Lucene as the engine.

SkyTrix avatar Feb 12 '24 13:02 SkyTrix

We are increasing max dim for Lucene in 2.12: https://github.com/opensearch-project/k-NN/blob/main/release-notes/opensearch-knn.release-notes-2.12.0.0.md, which will be released soon. Alternatively, you could use script scoring if the document count you're searching over is on the order of 100s or 1000s of docs.

jmazanec15 avatar Feb 12 '24 15:02 jmazanec15

Good to know thanks. Once that becomes available on AWS we might start looking into upgrading.

Do you see the current behavior as expected or a bug to be fixed? If expected behavior, could this be added to the documentation somewhere?

SkyTrix avatar Feb 12 '24 21:02 SkyTrix