milvus
milvus copied to clipboard
[Bug]: Search result length is not equal to the limit(topK) value after reinstallation
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.2.0-20230601-5710752f
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2023-06-01T13:05:03.722Z] 2023-06-01 13:05:03.302 | INFO | MainThread |utils:load_and_search:206 - collection name: task_2_IVF_PQ
[2023-06-01T13:05:03.722Z] 2023-06-01 13:05:03.302 | INFO | MainThread |utils:load_and_search:207 - load collection
[2023-06-01T13:05:03.722Z] 2023-06-01 13:05:03.309 | INFO | MainThread |utils:load_and_search:211 - load time: 0.0070
[2023-06-01T13:05:03.722Z] 2023-06-01 13:05:03.320 | INFO | MainThread |utils:load_and_search:225 - {'metric_type': 'L2', 'params': {'nprobe': 10}}
[2023-06-01T13:05:03.722Z] 2023-06-01 13:05:03.320 | INFO | MainThread |utils:load_and_search:228 -
[2023-06-01T13:05:03.722Z] Search...
[2023-06-01T13:05:03.722Z] 2023-06-01 13:05:03.327 | INFO | MainThread |utils:load_and_search:239 - hit: id: 930, distance: 28.98775291442871, entity: {'count': 930, 'random_value': -13.0}
[2023-06-01T13:05:03.722Z] 2023-06-01 13:05:03.327 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2343, distance: 31.38789176940918, entity: {'count': 2343, 'random_value': -16.0}
[2023-06-01T13:05:03.722Z] 2023-06-01 13:05:03.327 | INFO | MainThread |utils:load_and_search:239 - hit: id: 1325, distance: 31.5164852142334, entity: {'count': 1325, 'random_value': -15.0}
[2023-06-01T13:05:03.722Z] 2023-06-01 13:05:03.327 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2867, distance: 32.024906158447266, entity: {'count': 2867, 'random_value': -18.0}
[2023-06-01T13:05:03.722Z] Traceback (most recent call last):
[2023-06-01T13:05:03.722Z] File "scripts/action_after_reinstall.py", line 47, in <module>
[2023-06-01T13:05:03.722Z] task_2(data_size, host)
[2023-06-01T13:05:03.722Z] File "scripts/action_after_reinstall.py", line 29, in task_2
[2023-06-01T13:05:03.722Z] load_and_search(prefix)
[2023-06-01T13:05:03.722Z] File "/home/jenkins/agent/workspace/tests/python_client/deploy/scripts/utils.py", line 241, in load_and_search
[2023-06-01T13:05:03.722Z] assert len(ids) == topK, f"get {len(ids)} results, but topK is {topK}"
[2023-06-01T13:05:03.722Z] AssertionError: get 4 results, but topK is 5
Expected Behavior
len(ids) == topK
Steps To Reproduce
No response
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/993/pipeline
log:
artifacts-kafka-cluster-reinstall-993-server-first-deployment-logs.tar.gz
artifacts-kafka-cluster-reinstall-993-server-second-deployment-logs.tar.gz
artifacts-kafka-cluster-reinstall-993-pytest-logs.tar.gz
Anything else?
No response
/assign @jiaoew1991 /unassign
/assign @chyezh
it seems that there's no data loss after reinstallation.
all data has been flushed, so the problem cannot be caused by growing segments.
the problem may arise in the computational logic with special input, I will try to reproduce it.
version: 2.2.0-20230612-ae2fe478
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/1044/pipeline
[2023-06-12T13:05:57.358Z] 2023-06-12 13:05:57.198 | INFO | MainThread |utils:load_and_search:206 - collection name: task_1_IVF_FLAT
[2023-06-12T13:05:57.358Z] 2023-06-12 13:05:57.198 | INFO | MainThread |utils:load_and_search:207 - load collection
[2023-06-12T13:05:57.358Z] 2023-06-12 13:05:57.203 | INFO | MainThread |utils:load_and_search:211 - load time: 0.0050
[2023-06-12T13:05:57.358Z] 2023-06-12 13:05:57.216 | INFO | MainThread |utils:load_and_search:225 - {'metric_type': 'L2', 'params': {'nprobe': 10}}
[2023-06-12T13:05:57.358Z] 2023-06-12 13:05:57.216 | INFO | MainThread |utils:load_and_search:228 -
[2023-06-12T13:05:57.358Z] Search...
[2023-06-12T13:05:57.358Z] 2023-06-12 13:05:57.220 | INFO | MainThread |utils:load_and_search:239 - hit: id: 976, distance: 29.795345306396484, entity: {'count': 976, 'random_value': -15.0}
[2023-06-12T13:05:57.358Z] 2023-06-12 13:05:57.221 | INFO | MainThread |utils:load_and_search:239 - hit: id: 766, distance: 30.546741485595703, entity: {'count': 766, 'random_value': -11.0}
[2023-06-12T13:05:57.358Z] 2023-06-12 13:05:57.221 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2403, distance: 31.58251953125, entity: {'count': 2403, 'random_value': -17.0}
[2023-06-12T13:05:57.358Z] 2023-06-12 13:05:57.221 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2486, distance: 32.51908874511719, entity: {'count': 2486, 'random_value': -12.0}
[2023-06-12T13:05:57.358Z] Traceback (most recent call last):
[2023-06-12T13:05:57.358Z] File "scripts/action_after_reinstall.py", line 46, in <module>
[2023-06-12T13:05:57.358Z] task_1(data_size, host)
[2023-06-12T13:05:57.358Z] File "scripts/action_after_reinstall.py", line 14, in task_1
[2023-06-12T13:05:57.358Z] load_and_search(prefix)
[2023-06-12T13:05:57.358Z] File "/home/jenkins/agent/workspace/tests/python_client/deploy/scripts/utils.py", line 241, in load_and_search
[2023-06-12T13:05:57.358Z] assert len(ids) == topK, f"get {len(ids)} results, but topK is {topK}"
[2023-06-12T13:05:57.358Z] AssertionError: get 4 results, but topK is 5
log:
artifacts-kafka-standalone-reinstall-1044-pytest-logs.tar.gz
Uploading artifacts-kafka-standalone-reinstall-1044-server-first-deployment-logs.tar.gz…
artifacts-kafka-standalone-reinstall-1044-server-second-deployment-logs.tar.gz
/assign @congqixia please take a look. the search or query result is partial.
It reproduced again with image tag 2.2.0-20230707-511173a0
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/1179/pipeline
log:
artifacts-kafka-cluster-reinstall-1179-pytest-logs.tar.gz artifacts-kafka-cluster-reinstall-1179-server-first-deployment-logs.tar.gz artifacts-kafka-cluster-reinstall-1179-server-second-deployment-logs.tar.gz
Setup
- CollectionName: task_1_IVF_FLAT
- dim = 128,num [0, 1]
- {"index_type": "IVF_FLAT", "params": {"nlist": 128}, "metric_type": "L2"}
- Average cluster size: 6000/128= 46.875
- Search operation:
- search_vec: [1;128], search_param: {nprobe: 10},
- filter: count > 500 ( about 11/12 entities hits ).
Debug
No segments lost here
- After reinstall milvus, there's only one segment searched.
- segmentID: 442736548730561398
- Before reinstall milvus, there're 13 segments searched.
- Compact happend before reinstall, and there's no segment lost.
The difference of two search operation: Using Index after reinstalling, Not using index before reinstalling
- Before reinstall milvus, only one segment 442736548729940644 uses index, others do not.
- After reinstall milvus, fully using index on one segment.
Is that possible, by using IVF_FLAT, 10 vector was recalled in 10 cluster in IVF, but filter the 6 vector by expr count > 500
?
the search vector is [1,1,1,1,....] locating the corner of the vector space.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
@zhuwenxing @chyezh any updates
image: 2.3.0-20230918-dde27711-amd64
[2023-09-18T13:38:15.243Z] 2023-09-18 13:38:15.140 | INFO | MainThread |utils:load_and_search:259 - ###########
[2023-09-18T13:38:15.243Z] 2023-09-18 13:38:15.143 | INFO | MainThread |utils:load_and_search:206 - collection name: task_2_IVF_FLAT
[2023-09-18T13:38:15.243Z] 2023-09-18 13:38:15.143 | INFO | MainThread |utils:load_and_search:207 - load collection
[2023-09-18T13:38:19.400Z] 2023-09-18 13:38:19.232 | INFO | MainThread |utils:load_and_search:211 - load time: 4.0887
[2023-09-18T13:38:19.400Z] 2023-09-18 13:38:19.243 | INFO | MainThread |utils:load_and_search:225 - {'metric_type': 'L2', 'params': {'nprobe': 10}}
[2023-09-18T13:38:19.400Z] 2023-09-18 13:38:19.243 | INFO | MainThread |utils:load_and_search:228 -
[2023-09-18T13:38:19.400Z] Search...
[2023-09-18T13:38:19.655Z] 2023-09-18 13:38:19.423 | INFO | MainThread |utils:load_and_search:239 - hit: id: 764, distance: 30.432262420654297, entity: {'count': 764, 'random_value': -18.0}
[2023-09-18T13:38:19.655Z] 2023-09-18 13:38:19.423 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2455, distance: 31.647565841674805, entity: {'count': 2455, 'random_value': -17.0}
[2023-09-18T13:38:19.655Z] 2023-09-18 13:38:19.423 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2424, distance: 32.878353118896484, entity: {'count': 2424, 'random_value': -17.0}
[2023-09-18T13:38:19.655Z] 2023-09-18 13:38:19.423 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2737, distance: 33.31123733520508, entity: {'count': 2737, 'random_value': -14.0}
[2023-09-18T13:38:19.655Z] Traceback (most recent call last):
[2023-09-18T13:38:19.655Z] File "scripts/action_after_reinstall.py", line 47, in <module>
[2023-09-18T13:38:19.655Z] task_2(data_size, host)
[2023-09-18T13:38:19.655Z] File "scripts/action_after_reinstall.py", line 33, in task_2
[2023-09-18T13:38:19.655Z] load_and_search(prefix)
[2023-09-18T13:38:19.655Z] File "/home/jenkins/agent/workspace/tests/python_client/deploy/scripts/utils.py", line 241, in load_and_search
[2023-09-18T13:38:19.655Z] assert len(ids) == topK, f"get {len(ids)} results, but topK is {topK}"
[2023-09-18T13:38:19.655Z] AssertionError: get 4 results, but topK is 5
failed job:https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/1446/pipeline
log: artifacts-kafka-standalone-reinstall-1450-pytest-logs.tar.gz artifacts-kafka-standalone-reinstall-1450-server-first-deployment-logs.tar.gz artifacts-kafka-standalone-reinstall-1450-server-second-deployment-logs.tar.gz
failed again failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/1446/pipeline log: artifacts-kafka-standalone-reinstall-1446-pytest-logs (1).tar.gz artifacts-kafka-standalone-reinstall-1446-server-first-deployment-logs (1).tar.gz artifacts-kafka-standalone-reinstall-1446-server-second-deployment-logs (1).tar.gz
[2023-09-18T13:38:15.243Z] 2023-09-18 13:38:15.140 | INFO | MainThread |utils:load_and_search:257 - query latency: 0.0047s
[2023-09-18T13:38:15.243Z] 2023-09-18 13:38:15.140 | INFO | MainThread |utils:load_and_search:259 - ###########
[2023-09-18T13:38:15.243Z] 2023-09-18 13:38:15.143 | INFO | MainThread |utils:load_and_search:206 - collection name: task_2_IVF_FLAT
[2023-09-18T13:38:15.243Z] 2023-09-18 13:38:15.143 | INFO | MainThread |utils:load_and_search:207 - load collection
[2023-09-18T13:38:19.400Z] 2023-09-18 13:38:19.232 | INFO | MainThread |utils:load_and_search:211 - load time: 4.0887
[2023-09-18T13:38:19.400Z] 2023-09-18 13:38:19.243 | INFO | MainThread |utils:load_and_search:225 - {'metric_type': 'L2', 'params': {'nprobe': 10}}
[2023-09-18T13:38:19.400Z] 2023-09-18 13:38:19.243 | INFO | MainThread |utils:load_and_search:228 -
[2023-09-18T13:38:19.400Z] Search...
[2023-09-18T13:38:19.655Z] 2023-09-18 13:38:19.423 | INFO | MainThread |utils:load_and_search:239 - hit: id: 764, distance: 30.432262420654297, entity: {'count': 764, 'random_value': -18.0}
[2023-09-18T13:38:19.655Z] 2023-09-18 13:38:19.423 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2455, distance: 31.647565841674805, entity: {'count': 2455, 'random_value': -17.0}
[2023-09-18T13:38:19.655Z] 2023-09-18 13:38:19.423 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2424, distance: 32.878353118896484, entity: {'count': 2424, 'random_value': -17.0}
[2023-09-18T13:38:19.655Z] 2023-09-18 13:38:19.423 | INFO | MainThread |utils:load_and_search:239 - hit: id: 2737, distance: 33.31123733520508, entity: {'count': 2737, 'random_value': -14.0}
[2023-09-18T13:38:19.655Z] Traceback (most recent call last):
[2023-09-18T13:38:19.655Z] File "scripts/action_after_reinstall.py", line 47, in <module>
[2023-09-18T13:38:19.655Z] task_2(data_size, host)
[2023-09-18T13:38:19.655Z] File "scripts/action_after_reinstall.py", line 33, in task_2
[2023-09-18T13:38:19.655Z] load_and_search(prefix)
[2023-09-18T13:38:19.655Z] File "/home/jenkins/agent/workspace/tests/python_client/deploy/scripts/utils.py", line 241, in load_and_search
[2023-09-18T13:38:19.655Z] assert len(ids) == topK, f"get {len(ids)} results, but topK is {topK}"
[2023-09-18T13:38:19.655Z] AssertionError: get 4 results, but topK is 5
I have reproduced the same problem with rocksmq in no-chaos environment.
In these test case, new 3000 vectors is always inserted with same primary key (field count
) as existed vectors after reinstallation.
On searching, there's one segment. Some vectors with same primary key in ivf index was returned from these segment, and was deduplicated at reduced time.
It's expected case under current Milvus implementation, but not a bug.
Please modify the test case to avoid duplicate primary key in these test.
/assign @zhuwenxing /unassign