k-NN
k-NN copied to clipboard
[BUG] Approximate k-NN doesn't always return all results when there are multiple segments
What is the bug? When indexing a number of documents and executing an approximate k-NN search on them, not all matches are always being returned. Often times switches between two sets of results, one of them often having the correct number of results (depending on whether it hits primary or replica?). The issue disappears after force merging to one segment.
How can one reproduce the bug?
Have an index with a knn_vector
field and index some documents, leaving about a minute between each index operation. Usually reproduced after indexing about 8-16 documents, this time already saw it go wrong with 4 documents.
"ec_knn512@fingerPrint" : {
"type" : "knn_vector",
"dimension" : 512
}
"settings" : {
"index" : {
"number_of_shards" : "1",
"knn" : "true",
"number_of_replicas" : "1"
}
}
Given the large knn vector I need to provide the data in gists as the issue body gets too large. The 4 indexed documents and search query: https://gist.github.com/SkyTrix/392e127a46f255392bd57f96bfb72bfe
First (wrong) result option
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.046935834,
"hits": [
{
"_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
"_id": "4mLnEciUPVzqk7-3aUUUKjK12",
"_score": 0.046935834,
"_source": {
"name": "Fourth FIle.cf2"
}
},
{
"_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
"_id": "4mLnEciUPVzqk7-3aUUPXBQyx",
"_score": 0.041627396,
"_source": {
"name": "First File.MFG"
}
}
]
}
}
Second (wrong) result option
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 0.05267875,
"hits": [
{
"_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
"_id": "4mLnEciUPVzqk7-3aUU8Qb4nA",
"_score": 0.05267875,
"_source": {
"name": "Eighth File.ds2"
}
},
{
"_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
"_id": "4mLnEciUPVzqk7-3aUUUKjK12",
"_score": 0.046935834,
"_source": {
"name": "Fourth FIle.cf2"
}
},
{
"_index": "ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c",
"_id": "4mLnEciUPVzqk7-3aUUPXBQyx",
"_score": 0.041627396,
"_source": {
"name": "First File.MFG"
}
}
]
}
}
_segments output for the index
{
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"indices": {
"ecc_nodes_a-t-1100-5920-8100-0807_43017401d87b7860b52c": {
"shards": {
"0": [
{
"routing": {
"state": "STARTED",
"primary": true,
"node": "t1sWlB6tR7OVKu18Z0-D7Q"
},
"num_committed_segments": 1,
"num_search_segments": 9,
"segments": {
"_9d": {
"generation": 337,
"num_docs": 64,
"deleted_docs": 0,
"size_in_bytes": 322787,
"memory_in_bytes": 0,
"committed": true,
"search": false,
"version": "9.5.0",
"compound": false,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_9o": {
"generation": 348,
"num_docs": 54,
"deleted_docs": 11,
"size_in_bytes": 346570,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": false,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_as": {
"generation": 388,
"num_docs": 3,
"deleted_docs": 67,
"size_in_bytes": 345019,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": false,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_at": {
"generation": 389,
"num_docs": 0,
"deleted_docs": 1,
"size_in_bytes": 34046,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_au": {
"generation": 390,
"num_docs": 1,
"deleted_docs": 0,
"size_in_bytes": 55172,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_av": {
"generation": 391,
"num_docs": 1,
"deleted_docs": 1,
"size_in_bytes": 32585,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_aw": {
"generation": 392,
"num_docs": 0,
"deleted_docs": 1,
"size_in_bytes": 31846,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_ax": {
"generation": 393,
"num_docs": 0,
"deleted_docs": 1,
"size_in_bytes": 32004,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_ay": {
"generation": 394,
"num_docs": 0,
"deleted_docs": 1,
"size_in_bytes": 57060,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_az": {
"generation": 395,
"num_docs": 1,
"deleted_docs": 0,
"size_in_bytes": 49338,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
}
}
},
{
"routing": {
"state": "STARTED",
"primary": false,
"node": "PHzv1yuSRnuzFUiSKkt_cg"
},
"num_committed_segments": 1,
"num_search_segments": 10,
"segments": {
"_9b": {
"generation": 335,
"num_docs": 64,
"deleted_docs": 0,
"size_in_bytes": 323673,
"memory_in_bytes": 0,
"committed": true,
"search": false,
"version": "9.5.0",
"compound": false,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_9m": {
"generation": 346,
"num_docs": 54,
"deleted_docs": 20,
"size_in_bytes": 408172,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": false,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_ag": {
"generation": 376,
"num_docs": 3,
"deleted_docs": 57,
"size_in_bytes": 278440,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": false,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_ah": {
"generation": 377,
"num_docs": 0,
"deleted_docs": 1,
"size_in_bytes": 33885,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_ai": {
"generation": 378,
"num_docs": 0,
"deleted_docs": 1,
"size_in_bytes": 34042,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_aj": {
"generation": 379,
"num_docs": 1,
"deleted_docs": 0,
"size_in_bytes": 55173,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_ak": {
"generation": 380,
"num_docs": 1,
"deleted_docs": 1,
"size_in_bytes": 32588,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_al": {
"generation": 381,
"num_docs": 0,
"deleted_docs": 1,
"size_in_bytes": 31849,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_am": {
"generation": 382,
"num_docs": 0,
"deleted_docs": 1,
"size_in_bytes": 32006,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_an": {
"generation": 383,
"num_docs": 0,
"deleted_docs": 1,
"size_in_bytes": 57065,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
},
"_ao": {
"generation": 384,
"num_docs": 1,
"deleted_docs": 0,
"size_in_bytes": 49340,
"memory_in_bytes": 0,
"committed": false,
"search": true,
"version": "9.5.0",
"compound": true,
"attributes": {
"Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
}
}
}
}
]
}
}
}
}
What is the expected behavior? I expect to always get the correct number of hits, in the example above, it should always return the 4 documents.
What is your host/environment?
- Amazon OpenSearch Service
- Reproduced on ES 7.9 & 7.10 and OpenSearch 1.3 and 2.7. For corresponding k-NN plugin versions see https://docs.aws.amazon.com/opensearch-service/latest/developerguide/knn.html
Do you have any additional context? What is also remarkable, when it goes wrong, is that hits.total mostly returns the wrong number of total hits. When increasing k and size in the query, the hits.total goes up. Not sure whether hits.total (not the number of returned hits) is always expected to be the full total in the index for k-NN? After force merging to 1 segment, the hits.total is always correct, no matter the k and size values. Again, after force merging the correct number of actual hits is also always correct, but should also be correct when there are more/many segments.
Thanks for reporting this @SkyTrix . I will take a look.
Just curious, @SkyTrix were there any deleted documents when you were doing the query?
@jmazanec15 do we have any RCA on why the problem was happening?
@SkyTrix I think this has something to do with having few documents in segments. What was the k/size you tried for 16 docs? I havent been able to repro the issue quite yet
Just curious, @SkyTrix were there any deleted documents when you were doing the query?
Can't tell by heart, but I'm pretty sure there were. We use OpenSearch as a read replica of a db, so every change to a certain record there results in a new document with the same _id in the index, so also a deleted document.
@SkyTrix I think this has something to do with having few documents in segments. What was the k/size you tried for 16 docs? I havent been able to repro the issue quite yet
@jmazanec15 k and size always matching, I believe when putting it on 16 it also didn't return all 16 results, but when putting it on 30 for example, it would find all 16 docs. I'm returning to work early next week so can try to reproduce again then. Anything in particular you want me to pay attention to/provide here? I'm also happy to live reproduce over a Teams call if you prefer that.
Thanks @SkyTrix.
Could you provide a script to reproduce the issue? I tried with the following but wasnt able to reproduce it locally:
Repro python attempt
import json
from opensearchpy import OpenSearch, RequestsHttpConnection
import time
import random
index_name = "test-index"
field_name = "test-field"
dim = 512
num_docs = 16
k = num_docs
size = num_docs
def action(doc_id):
return {'index': {'_index': index_name, '_id': doc_id}}
def bulk_transform(vec, field_name: str, action, id: int):
return [action(id), { field_name: vec }]
os = OpenSearch(
hosts=[{'host': "localhost", 'port': 9200}],
use_ssl=False,
verify_certs=False,
connection_class=RequestsHttpConnection,
pool_maxsize=20,
timeout=100
)
ind_body = {
"settings" : {
"index" : {
"number_of_shards" : "1",
"knn" : "true",
"number_of_replicas" : "1"
}
},
"mappings": {
"properties": {
field_name : {
"type" : "knn_vector",
"dimension" : dim
}
}
}
}
#random.seed(10)
# os.indices.delete(index=index_name, ignore=[400, 404])
# os.indices.create(index=index_name, body=ind_body)
vecs = [[random.random() for _ in range(dim)] for _ in range(num_docs)]
# for i, vec in enumerate(vecs):
# body = bulk_transform(vec, field_name, action, i)
# os.bulk(index=index_name, body=body)
# print("Sleeping")
# if i < len(vecs) - 1:
# time.sleep(60)
# else:
# time.sleep(1)
q_body = {
'size': size,
'query': {
'knn': {
field_name: {
'vector': [random.random() for _ in range(dim)],
'k': k
}
}
}
}
query_response = os.search(index=index_name, body=q_body, _source_excludes=[field_name])
print(json.dumps(query_response, indent=4))
Additionally, could you provide the segment count when you are able to reproduce it via /_cat/segments
?
I managed to reproduce again with only 4 documents. The knn-search with k and size set to 4 only returns 3 documents. When setting to 5, it returns all 4 of them. If it matters, the cluster uses dedicated master nodes and has 3 data nodes.
This is the output of _cat/segments
for the specific index:
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _hy 646 0 71 282.7kb 0 true false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ii 666 52 108 314kb 29764 true true 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ij 667 0 1 30.9kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ik 668 1 0 42.6kb 10388 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _il 669 1 1 29.5kb 8684 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _im 670 0 1 29.2kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _in 671 0 1 29.4kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _io 672 1 0 37kb 8756 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _i6 654 49 58 285.3kb 30204 true true 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _iq 674 3 122 316.8kb 16340 true true 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _ir 675 1 0 42.6kb 10388 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _is 676 1 1 29.5kb 8684 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _it 677 0 1 29.2kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _iu 678 0 1 29.4kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _iv 679 1 0 37kb 8756 true true 8.6.2 true
And a second time, again with 4 documents, where it only returns 2 results with k & size = 4 or 5. 3 results with k & size = 6 or 7, and eventually all 4 results with k & size = 8
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _hy 646 0 71 282.7kb 0 true false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ii 666 52 108 314kb 0 true false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ij 667 0 1 30.9kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ik 668 1 0 42.6kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _il 669 1 1 29.5kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _im 670 0 1 29.2kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _in 671 0 1 29.4kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _io 672 1 0 37kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _iq 674 49 6 203.6kb 26620 false true 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _jv 715 6 57 172.5kb 12444 false true 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _i6 654 49 58 285.3kb 0 true false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _iq 674 3 122 316.8kb 0 true false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _ir 675 1 0 42.6kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _is 676 1 1 29.5kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _it 677 0 1 29.2kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _iu 678 0 1 29.4kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _iv 679 1 0 37kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _ix 681 49 6 206.5kb 26772 false true 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _k2 722 6 57 172.5kb 12444 false true 8.6.2 false
I'll have a look at trying to reproduce with a script.
@jmazanec15 I had a lot of trouble trying to reproduce it with a simple script. It did however reproduce easily through some of our application code. What I ended up doing is capturing the calls to _bulk
and writing that to a file together with a timestamp. I then created a script to replay those calls at the same rate of the original calls.
The provided script is Scala instead of Python, but should be very straightforward to port if you need. It simply reads the reproducer.txt
file, finds the bulk payloads between BATCH_START_TIME <timestamp>
lines and executes them with the appropriate time in between.
I've run the reproducer script multiple times on both ES 7.9 and OpenSearch 2.11 and every time the problem was reproduced.
Steps to reproduce: https://gist.github.com/SkyTrix/4abe38b9c7d5dee51e44d951c6bded03
- Apply the mapping template from
mapping_template.txt
- Configure ES endpoint in
IndexingReplay.scala
and run it (it runs for 5 minutes) - Run the query from
query.txt
(multiple times) - Notice that less than 14 results are returned, sometimes jumping between two different values
- Update
k
andsize
to30
and rerun query, see that now 14 results are returned - Forcemerge the index and rerun the original query, the 14 results are returned
I hope this will allow you to reproduce the problem on your system.
@jmazanec15 @navneet1v Would you happen to have a chance to try out the reproducer? Thanks!
@SkyTrix sure, will take look this week and try
@SkyTrix From
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _hy 646 0 71 282.7kb 0 true false 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ii 666 52 108 314kb 29764 true true 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ij 667 0 1 30.9kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _ik 668 1 0 42.6kb 10388 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _il 669 1 1 29.5kb 8684 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _im 670 0 1 29.2kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _in 671 0 1 29.4kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 p x.x.x.x _io 672 1 0 37kb 8756 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _i6 654 49 58 285.3kb 30204 true true 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _iq 674 3 122 316.8kb 16340 true true 8.6.2 false
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _ir 675 1 0 42.6kb 10388 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _is 676 1 1 29.5kb 8684 true true 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _it 677 0 1 29.2kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _iu 678 0 1 29.4kb 0 true false 8.6.2 true
ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8 0 r x.x.x.x _iv 679 1 0 37kb 8756 true true 8.6.2 true
I see a very high number of deleted documents. I believe headers correspond to
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
For instance, in ecc_nodes_a-t-1310-5020-0000-9775_43017401ac18ea2979f8
I see that there are 3 docs and 122 that are deleted. How native index search will work is that those 122 documents will be still in the graph and on search, they may get returned from the graph and then filtered out because they are deleted. So, if one of the expected results is in this segment, it may not get returned because the k
results returned will be from the 122 deleted results. Increasing k
may lead to one of those 3 from getting returned.
Further, in the example above, there are 110 live docs to 266 deleted docs. So 70% of the vectors in the graph correspond to deleted documents.
For this reason, I recommend ensuring that the ratio of deleted docs in the index is no more than 10%. There is a forcemerge option where you can run with expunge deletes to ensure documents get deleted.
Lastly, can you run _cat/indices?v
and paste the result to see how many deleted documents you have in your index?
@jmazanec15 Thanks for looking into this.
We use OpenSearch as a read replica of a database, so every change to the database record will result in a new version of the document and thus an extra deleted document. So in our case, having a lot of deleted documents is "normal". Note that these new versions always have the same _id
as the previous version of the document, so I'm a bit surprised these old versions are still "in play". Is there any way only the last versions per _id
can get considered in the search?
I ran the reproducer again on an empty index. It eventually produces 16 documents of which 14 should be found by k-NN.
GET _cat/indices/ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 acTvS6-4QhiLKWSscL2DgA 1 1 16 87 613.5kb 613.5kb
GET _cat/segments/ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2?v
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _a 10 15 76 555.2kb 0 false true 9.7.0 true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _b 11 1 11 58kb 0 false true 9.7.0 true
The query returned only 10 documents with k
and n = 14
.
Checking again 30 minutes later, the segments were automatically merged and deleted docs were gone and all 14 results were returned.
GET _cat/indices/ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 acTvS6-4QhiLKWSscL2DgA 1 1 16 0 345.3kb 345.3kb
GET _cat/segments/ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2?v
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _c 12 16 0 345kb 0 true true 9.7.0 false
I thought, the more segments, the higher the chance to run into the issue, but here's a run that does always return the 14 documents:
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _i 18 7 39 274.7kb 0 false true 9.7.0 true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _v 31 4 29 181.6kb 0 false true 9.7.0 false
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _w 32 0 1 31.3kb 0 false true 9.7.0 true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _x 33 1 0 55.4kb 0 false true 9.7.0 true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _y 34 0 2 28.1kb 0 false true 9.7.0 true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _z 35 0 1 28.9kb 0 false true 9.7.0 true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _10 36 0 1 29.6kb 0 false true 9.7.0 true
ecc_nodes_a-dev-1006-2000-0009-4611_43036465760b9a8a8fe502-2 0 p x.x.x.x _11 37 4 14 192kb 0 false true 9.7.0 true
So in our case, having a lot of deleted documents is "normal".
Yes, typically with ANN search this will be a problem. For the HNSW graphs in old segments, the docs get marked as deleted, but they are not removed from the graph - main reason being that it is pretty difficult to efficiently remove elements from the graphs.
One thing you could do is use expunge deletes more aggressively so segments typically will have a strong majority of live docs.
Alternatively, lucene passes the non-live docs as a filter to the HNSW search so they get skipped (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java#L236-L238), so switching to lucene engine may yield better results for you case as well.
One thing you could do is use expunge deletes more aggressively so segments typically will have a strong majority of live docs.
The documentation warns that _forcemerge
should not be used on indices that are still actively receiving writes, so I'm not sure I'd want to go there..? https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html#forcemerge-api-desc
Thanks for the tip, I'll have to see what the impact is of switching to the Lucene engine. Out of interest: given that both nmslib and Lucene use HNSW graphs, can't nmslib then also pass the non-live docs as a filter?
Edit: Sadly the 1024 dimensions limit is too small for us, so it doesn't look like we can use Lucene as the engine.
We are increasing max dim for Lucene in 2.12: https://github.com/opensearch-project/k-NN/blob/main/release-notes/opensearch-knn.release-notes-2.12.0.0.md, which will be released soon. Alternatively, you could use script scoring if the document count you're searching over is on the order of 100s or 1000s of docs.
Good to know thanks. Once that becomes available on AWS we might start looking into upgrading.
Do you see the current behavior as expected or a bug to be fixed? If expected behavior, could this be added to the documentation somewhere?