milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: Search results become less and less if keep deleting the search results

Open yanliang567 opened this issue 2 years ago • 20 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:  2.2.0-20230116-7e2121e6
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar

Current Behavior

i have one collection with 13+ million vectors, and i want to delete 2.6 million vectors. So I

  1. search a random vector with top=10000
  2. get the 10000 ids from search results
  3. delete the entities with the ids
  4. repeat search and delete for 260 times search results becomes less and less, and to 0 after deleting for several rounds.

image

Expected Behavior

search always returns top=10000 reproduce code:

for r in range(rounds):
            search_vector = [[random.random() for _ in range(dim)] for _ in range(1)]
            results = c.search(data=search_vector, anns_field=vector_field_name,
                               param=search_params, limit=nb)
            for hits in results:
                ids = hits.ids
                c.delete(expr=f"{primary_field_name} in {ids}")
                logging.info(f"deleted {len(ids)} entities")

Steps To Reproduce

check reproduce code above

Milvus Log

check the logs around 01/17/2023 08:01:40 AM

yanliang-cluster-1cu-milvus-datanode-69f6f588c5-g4x2h        1/1     Running     0               43h
yanliang-cluster-1cu-milvus-indexnode-6d6554f9fc-dbgqj       1/1     Running     0               43h
yanliang-cluster-1cu-milvus-mixcoord-5c45bf7fc-p5hpq         1/1     Running     0               43h
yanliang-cluster-1cu-milvus-proxy-7bc7bfc6c4-8s7t9           1/1     Running     0               43h
yanliang-cluster-1cu-milvus-querynode-5784dcf45d-d69nb       1/1     Running     0               43h
yanliang-cluster-1cu-milvus-querynode-5784dcf45d-w86g6       1/1     Running     0               43h

Anything else?

No response

yanliang567 avatar Jan 18 '23 03:01 yanliang567

/assign @liliu-z @cydrain /unassign

yanliang567 avatar Jan 18 '23 03:01 yanliang567

@yanliang567 what's the index type and search param ?

cydrain avatar Jan 19 '23 04:01 cydrain

"index_type": "HNSW", "metric_type": "IP", "params": {"M": 8, "efConstruction": 96} search_params = {"metric_type": "IP", "params": {"ef": 10000}}

yanliang567 avatar Jan 19 '23 04:01 yanliang567

test knowhere with sift1M dataset, till the last iteration (all data have been removed), knowhere can always return 10000 valid result. (bt means bitset count) Screenshot from 2023-01-31 17-23-27

same for glove-200 dataset Screenshot from 2023-01-31 18-01-50

cydrain avatar Jan 31 '23 09:01 cydrain

change metric type to "L2", script runs as expected Screenshot from 2023-01-31 18-52-25

cydrain avatar Jan 31 '23 10:01 cydrain

do normalize with IP metric type, following script can always delete 10000 entities 21785_create_n_insert_normalize.py.txt

cydrain avatar Feb 01 '23 02:02 cydrain

so this issue is not a real issue, IP metric type MUST do normalization first @yanliang567

cydrain avatar Feb 01 '23 02:02 cydrain

/assign @yanliang567

cydrain avatar Feb 01 '23 02:02 cydrain

so this issue is not a real issue, IP metric type MUST do normalization first @yanliang567

Why IP has to be normalized? If ip normalized that would be cosine by the way

xiaofan-luan avatar Feb 01 '23 04:02 xiaofan-luan

@liliu-z

xiaofan-luan avatar Feb 01 '23 04:02 xiaofan-luan

We only support IP for now, and Cosine is not supported yet. IP without normalization is workable but with super low recall (it doesn't make sense in Mathematics). IP + normalization = Cosine, but we didn't support it for now. This is the reason why we recommend users do normalization before using IP

Getting back to this issue. We still need to investigate what happen since it is not as expected.

liliu-z avatar Feb 01 '23 06:02 liliu-z

@cydrain Can you help do a further check on why this happens? Appreciate it!

liliu-z avatar Feb 01 '23 06:02 liliu-z

@cydrain Can you help do a further check on why this happens? Appreciate it!

ok

cydrain avatar Feb 01 '23 06:02 cydrain

This issue only exists for HNSW, not for IVF_FLAT or IVF_SQ8

cydrain avatar Feb 01 '23 08:02 cydrain

/assign @hhy3

liliu-z avatar Feb 08 '23 11:02 liliu-z

It is because IP is not a distance, so when using IP to build hnsw graph, the graph is not fully connected. So starting from ep it can only find points nearby.

hhy3 avatar Feb 08 '23 12:02 hhy3

so all graph with IP should have similar problem? build the graph with pre cluster might help on it

xiaofan-luan avatar Feb 08 '23 13:02 xiaofan-luan

so all graph with IP should have similar problem? build the graph with pre cluster might help on it

To my understanding, when we use IP in HNSW, the connectivity of the graph depends on datasets. We can try wether pre-clustering can mitigate this, but in theory IP + graph is a not a make sense combination. Will work on Cosine metric type very soon.

liliu-z avatar Feb 09 '23 06:02 liliu-z

For graph based index (such as HNSW), the distance must obey this rule:

  if A close to B, and B close to C ==> A close to C (distance is conductive)

For IP distance (without normalization), above rule is disobeyed, and makes HNSW graph not fully connected.

cydrain avatar Feb 10 '23 06:02 cydrain

@yanliang567 Milvus 2.3 can support COSINE now, can you retest this case with COSINE metric type ? And suggest to change the Milestone to 2.3

cydrain avatar Jun 14 '23 03:06 cydrain

OK, will do

yanliang567 avatar Jun 14 '23 03:06 yanliang567

not reproduced on master-20230619-a6310050 with cosine

yanliang567 avatar Jun 25 '23 03:06 yanliang567