milvus
milvus copied to clipboard
[Bug]: [Nightly]Query on a collection will return data after delete all entities
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: 76c24db
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.9.dev16
Current Behavior
Query on a collection will return data after delete all entities.
[2023-05-28T21:34:54.530Z] _________________ TestDeleteOperation.test_delete_entity_loop __________________
[2023-05-28T21:34:54.530Z] [gw3] linux -- Python 3.8.16 /usr/local/bin/python3
[2023-05-28T21:34:54.530Z]
[2023-05-28T21:34:54.530Z] self = <test_delete.TestDeleteOperation object at 0x7fb1162b5970>
[2023-05-28T21:34:54.530Z]
[2023-05-28T21:34:54.530Z] @pytest.mark.tags(CaseLabel.L2)
[2023-05-28T21:34:54.530Z] def test_delete_entity_loop(self):
[2023-05-28T21:34:54.530Z] """
[2023-05-28T21:34:54.530Z] target: test delete all entities one by one in a loop
[2023-05-28T21:34:54.530Z] method: delete data one by one for a loop
[2023-05-28T21:34:54.530Z] expected: No exception
[2023-05-28T21:34:54.530Z] """
[2023-05-28T21:34:54.530Z] # init an auto_id collection and insert tmp_nb data, flush and load
[2023-05-28T21:34:54.530Z] collection_w, _, _, ids = self.init_collection_general(prefix, nb=tmp_nb, insert_data=True, auto_id=True)[0:4]
[2023-05-28T21:34:54.530Z]
[2023-05-28T21:34:54.530Z] for del_id in ids:
[2023-05-28T21:34:54.530Z] expr = f'{ct.default_int64_field_name} in {[del_id]}'
[2023-05-28T21:34:54.530Z] res = collection_w.delete(expr)[0]
[2023-05-28T21:34:54.530Z] assert res.delete_count == 1
[2023-05-28T21:34:54.530Z]
[2023-05-28T21:34:54.530Z] # query with all ids
[2023-05-28T21:34:54.530Z] expr = f'{ct.default_int64_field_name} in {ids}'
[2023-05-28T21:34:54.530Z] > collection_w.query(expr, check_task=CheckTasks.check_query_empty)
[2023-05-28T21:34:54.530Z]
[2023-05-28T21:34:54.658Z] [2023-05-28 20:11:43 - DEBUG - ci_test]: (api_request) : [Collection.query] args: ['int64 in [441792395842082248, 441792395842082249, 441792395842082250, 441792395842082251, 441792395842082252, 441792395842082253, 441792395842082254, 441792395842082255, 441792395842082256, 441792395842082257, 441792395842082258, 441792395842082259, 441792395842082260, 441792395842082261, 44179239......, kwargs: {} (api_request.py:56)
[2023-05-28T21:34:54.658Z] [2023-05-28 20:11:44 - DEBUG - ci_test]: (api_response) : [{'int64': 441792395842085224}, {'int64': 441792395842085225}, {'int64': 441792395842085228}, {'int64': 441792395842085232}, {'int64': 441792395842085234}, {'int64': 441792395842085235}, {'int64': 441792395842085236}, {'int64': 441792395842085243}, {'int64': 441792395842085244}, {'int64': 4417923958...... (api_request.py:31)
Expected Behavior
return empty
Steps To Reproduce
No response
Milvus Log
- link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/204/pipeline/176/
- log: artifacts-milvus-distributed-pulsar-nightly-204-pymilvus-e2e-logs.tar.gz
- failed time: [2023-05-28T20:11:44.550Z] [gw3] [ 14%] FAILED testcases/test_delete.py::TestDeleteOperation::test_delete_entity_loop
- collection name: delete_L7kn8Ncn
Anything else?
No response
similar: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/202/pipeline/156/
- log: artifacts-milvus-distributed-kafka-nightly-202-pymilvus-e2e-logs.tar.gz
- failed time: [2023-05-26T20:10:57.303Z] [gw2] [ 17%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_insert_same_id_sealed_string[True]
- collection name: delete_jrHTlgDZ
master: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/master/386/pipeline/123/
- log: artifacts-milvus-standalone-nightly-386-pymilvus-e2e-logs.tar.gz
- collection name: delete_ECTfWwfI
- failed time: [2023-05-28T18:01:45.098Z] [gw3] [ 12%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_and_index_with_string
/assign @jiaoew1991 /unassign
/assign @aoiasd /unassign
seems that this is a very serious issue and we received many feedbacks that search can still find deleted data. @NicoYuan1986 are we using strong consistency?
3. test_delete_and_index_with_string
Yes. All the test cases use consistency level strong unless otherwise stated.
Reproduced.
- link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/228/pipeline/188
- log: artifacts-milvus-distributed-pulsar-nightly-228-pymilvus-e2e-logs.tar.gz
- collection name: delete_ko6VC64v
- failed time: [2023-06-20T20:12:28.031Z] [gw4] [ 12%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_entity_loop_with_string
Was a known question, will fix by https://github.com/milvus-io/milvus/pull/24858.
relate: https://github.com/milvus-io/milvus/issues/24234
reproduced:
- link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/241/pipeline/124/
- log: artifacts-milvus-standalone-nightly-241-pymilvus-e2e-logs.tar.gz
- collection name: delete_zvmKP77t
- failed time: [2023-07-03T20:08:22.027Z] [gw5] [ 13%] FAILED testcases/test_delete.py::TestDeleteOperation::test_delete_insert_multi
Update:
- link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/244/pipeline/124
- log: artifacts-milvus-standalone-nightly-244-pymilvus-e2e-logs.tar.gz
- collection name: delete_s5kjFGfy
- failed time: [2023-07-06T20:11:51.002Z] [gw3] [ 13%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_insert_multi_with_string
Reproduced in CI:
- link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-25410/1/pipeline
- log: artifacts-milvus-standalone-PR-25410-1-pymilvus-e2e-logs.tar.gz
- collection name: delete_TnewiT7f
- Failed timeline: [2023-07-07T08:23:18.169Z] [gw3] [ 12%] FAILED testcases/test_delete.py::TestDeleteOperation::test_delete_insert_multi
Reproduced in Nightly:
- link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/247/pipeline/188
- log: artifacts-milvus-distributed-pulsar-nightly-247-pymilvus-e2e-logs.tar.gz
- collection name: delete_cHiO3kVi
- failed time: [2023-07-09T20:15:15.211Z] [gw3] [ 13%] FAILED testcases/test_delete.py::TestDeleteOperation::test_delete_entity_loop
/assign @SimFG
@NicoYuan1986 The milvus log level should be set DEBUG
fixed in https://github.com/milvus-io/milvus/pull/25513
reproduced: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/254/pipeline/188
- log: artifacts-milvus-distributed-pulsar-nightly-254-pymilvus-e2e-logs.tar.gz
- collection name: delete_uWI0TBrz
- failed time: [2023-07-16T20:12:32.070Z] [gw3] [ 18%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_entity_loop_with_string
https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-25646/1/pipeline/202 https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-25646/3/pipeline/167
reproduce : https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/255/pipeline/188
mark as high priority. This may block the release of 2.2.12
@NicoYuan1986 could you add a logic to retry 10 times, each times wait for 100ms, and check if this is issue last forever or only one time?
The error cannot reproduce with the logic locally. I added the case in CI. #25742
@NicoYuan1986 could you add a logic to retry 10 times, each times wait for 100ms, and check if this is issue last forever or only one time?
According to https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-25742/1/pipeline/167, retry 10 times, the issue arises only one time.
time:
[2023-07-19T07:09:02.302Z] [2023-07-19 07:07:30 - DEBUG - ci_test]: (api_request) : [Collection.query] args: ['int64 in [442957853345475231, 442957853345475232, 442957853345475233, 442957853345475234, 442957853345475235, 442957853345475236, 442957853345475237, 442957853345475238, 442957853345475239, 442957853345475240, 442957853345475241, 442957853345475242, 442957853345475243, 442957853345475244, 44295785......, kwargs: {} (api_request.py:56)
[2023-07-19T07:09:02.302Z] [2023-07-19 07:07:31 - DEBUG - ci_test]: (api_response) : [{'int64': 442957853345475871}, {'int64': 442957853345475875}, {'int64': 442957853345475876}, {'int64': 442957853345475878}, {'int64': 442957853345475880}, {'int64': 442957853345475881}, {'int64': 442957853345475885}, {'int64': 442957853345475886}, {'int64': 442957853345475888}, {'int64': 4429578533...... (api_request.py:31)
Found 2 problems here.
- delete msgs' ts are not sorted and this will lead to query error. Fixed in this pr: #25757
- sealed segment are retrieved before all delete msgs are consumed. Still have no idea.
related pr:
- Make query set guarantee ts based on default consistency level #25579 Confirmed not to be a cause of the current issue
- Fix delete messages unsorted confirming
Found a potential problem, delete msg has a lot of duplication.
k represents a segment, len represents the executed delete msg, and set_len represents the deduplicated
keep reproducing: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/259/pipeline/188
after the discussion, we believe #25849 will fix the issue
@NicoYuan1986 does this reproduce recently?
@NicoYuan1986 does this reproduce recently?
The error has not reproduced since #25849 was merged.