milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [Nightly]Query on a collection will return data after delete all entities

Open NicoYuan1986 opened this issue 1 year ago • 6 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: 76c24db
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):     kafka
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.9.dev16

Current Behavior

Query on a collection will return data after delete all entities.

[2023-05-28T21:34:54.530Z] _________________ TestDeleteOperation.test_delete_entity_loop __________________
[2023-05-28T21:34:54.530Z] [gw3] linux -- Python 3.8.16 /usr/local/bin/python3
[2023-05-28T21:34:54.530Z] 
[2023-05-28T21:34:54.530Z] self = <test_delete.TestDeleteOperation object at 0x7fb1162b5970>
[2023-05-28T21:34:54.530Z] 
[2023-05-28T21:34:54.530Z]     @pytest.mark.tags(CaseLabel.L2)
[2023-05-28T21:34:54.530Z]     def test_delete_entity_loop(self):
[2023-05-28T21:34:54.530Z]         """
[2023-05-28T21:34:54.530Z]         target: test delete all entities one by one in a loop
[2023-05-28T21:34:54.530Z]         method: delete data one by one for a loop
[2023-05-28T21:34:54.530Z]         expected: No exception
[2023-05-28T21:34:54.530Z]         """
[2023-05-28T21:34:54.530Z]         # init an auto_id collection and insert tmp_nb data, flush and load
[2023-05-28T21:34:54.530Z]         collection_w, _, _, ids = self.init_collection_general(prefix, nb=tmp_nb, insert_data=True, auto_id=True)[0:4]
[2023-05-28T21:34:54.530Z]     
[2023-05-28T21:34:54.530Z]         for del_id in ids:
[2023-05-28T21:34:54.530Z]             expr = f'{ct.default_int64_field_name} in {[del_id]}'
[2023-05-28T21:34:54.530Z]             res = collection_w.delete(expr)[0]
[2023-05-28T21:34:54.530Z]             assert res.delete_count == 1
[2023-05-28T21:34:54.530Z]     
[2023-05-28T21:34:54.530Z]         # query with all ids
[2023-05-28T21:34:54.530Z]         expr = f'{ct.default_int64_field_name} in {ids}'
[2023-05-28T21:34:54.530Z] >       collection_w.query(expr, check_task=CheckTasks.check_query_empty)
[2023-05-28T21:34:54.530Z] 


[2023-05-28T21:34:54.658Z] [2023-05-28 20:11:43 - DEBUG - ci_test]: (api_request)  : [Collection.query] args: ['int64 in [441792395842082248, 441792395842082249, 441792395842082250, 441792395842082251, 441792395842082252, 441792395842082253, 441792395842082254, 441792395842082255, 441792395842082256, 441792395842082257, 441792395842082258, 441792395842082259, 441792395842082260, 441792395842082261, 44179239......, kwargs: {} (api_request.py:56)
[2023-05-28T21:34:54.658Z] [2023-05-28 20:11:44 - DEBUG - ci_test]: (api_response) : [{'int64': 441792395842085224}, {'int64': 441792395842085225}, {'int64': 441792395842085228}, {'int64': 441792395842085232}, {'int64': 441792395842085234}, {'int64': 441792395842085235}, {'int64': 441792395842085236}, {'int64': 441792395842085243}, {'int64': 441792395842085244}, {'int64': 4417923958......  (api_request.py:31)

Expected Behavior

return empty

Steps To Reproduce

No response

Milvus Log

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/204/pipeline/176/
  2. log: artifacts-milvus-distributed-pulsar-nightly-204-pymilvus-e2e-logs.tar.gz
  3. failed time: [2023-05-28T20:11:44.550Z] [gw3] [ 14%] FAILED testcases/test_delete.py::TestDeleteOperation::test_delete_entity_loop
  4. collection name: delete_L7kn8Ncn

Anything else?

No response

NicoYuan1986 avatar May 29 '23 02:05 NicoYuan1986

similar: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/202/pipeline/156/

  1. log: artifacts-milvus-distributed-kafka-nightly-202-pymilvus-e2e-logs.tar.gz
  2. failed time: [2023-05-26T20:10:57.303Z] [gw2] [ 17%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_insert_same_id_sealed_string[True]
  3. collection name: delete_jrHTlgDZ

NicoYuan1986 avatar May 29 '23 02:05 NicoYuan1986

master: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/master/386/pipeline/123/

  1. log: artifacts-milvus-standalone-nightly-386-pymilvus-e2e-logs.tar.gz
  2. collection name: delete_ECTfWwfI
  3. failed time: [2023-05-28T18:01:45.098Z] [gw3] [ 12%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_and_index_with_string

NicoYuan1986 avatar May 29 '23 02:05 NicoYuan1986

/assign @jiaoew1991 /unassign

yanliang567 avatar May 29 '23 03:05 yanliang567

/assign @aoiasd /unassign

jiaoew1991 avatar May 29 '23 11:05 jiaoew1991

seems that this is a very serious issue and we received many feedbacks that search can still find deleted data. @NicoYuan1986 are we using strong consistency?

xiaofan-luan avatar May 29 '23 16:05 xiaofan-luan

3. test_delete_and_index_with_string

Yes. All the test cases use consistency level strong unless otherwise stated. image

NicoYuan1986 avatar Jun 08 '23 08:06 NicoYuan1986

Reproduced.

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/228/pipeline/188
  2. log: artifacts-milvus-distributed-pulsar-nightly-228-pymilvus-e2e-logs.tar.gz
  3. collection name: delete_ko6VC64v
  4. failed time: [2023-06-20T20:12:28.031Z] [gw4] [ 12%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_entity_loop_with_string

NicoYuan1986 avatar Jun 21 '23 02:06 NicoYuan1986

Was a known question, will fix by https://github.com/milvus-io/milvus/pull/24858.

aoiasd avatar Jun 21 '23 07:06 aoiasd

relate: https://github.com/milvus-io/milvus/issues/24234

aoiasd avatar Jun 21 '23 07:06 aoiasd

reproduced:

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/241/pipeline/124/
  2. log: artifacts-milvus-standalone-nightly-241-pymilvus-e2e-logs.tar.gz
  3. collection name: delete_zvmKP77t
  4. failed time: [2023-07-03T20:08:22.027Z] [gw5] [ 13%] FAILED testcases/test_delete.py::TestDeleteOperation::test_delete_insert_multi

NicoYuan1986 avatar Jul 04 '23 02:07 NicoYuan1986

Update:

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/244/pipeline/124
  2. log: artifacts-milvus-standalone-nightly-244-pymilvus-e2e-logs.tar.gz
  3. collection name: delete_s5kjFGfy
  4. failed time: [2023-07-06T20:11:51.002Z] [gw3] [ 13%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_insert_multi_with_string

NicoYuan1986 avatar Jul 07 '23 02:07 NicoYuan1986

Reproduced in CI:

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-25410/1/pipeline
  2. log: artifacts-milvus-standalone-PR-25410-1-pymilvus-e2e-logs.tar.gz
  3. collection name: delete_TnewiT7f
  4. Failed timeline: [2023-07-07T08:23:18.169Z] [gw3] [ 12%] FAILED testcases/test_delete.py::TestDeleteOperation::test_delete_insert_multi

binbinlv avatar Jul 07 '23 09:07 binbinlv

Reproduced in Nightly:

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/247/pipeline/188
  2. log: artifacts-milvus-distributed-pulsar-nightly-247-pymilvus-e2e-logs.tar.gz
  3. collection name: delete_cHiO3kVi
  4. failed time: [2023-07-09T20:15:15.211Z] [gw3] [ 13%] FAILED testcases/test_delete.py::TestDeleteOperation::test_delete_entity_loop

NicoYuan1986 avatar Jul 10 '23 09:07 NicoYuan1986

/assign @SimFG

xiaofan-luan avatar Jul 10 '23 13:07 xiaofan-luan

@NicoYuan1986 The milvus log level should be set DEBUG

SimFG avatar Jul 11 '23 07:07 SimFG

fixed in https://github.com/milvus-io/milvus/pull/25513

sunby avatar Jul 13 '23 03:07 sunby

reproduced: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/254/pipeline/188

  1. log: artifacts-milvus-distributed-pulsar-nightly-254-pymilvus-e2e-logs.tar.gz
  2. collection name: delete_uWI0TBrz
  3. failed time: [2023-07-16T20:12:32.070Z] [gw3] [ 18%] FAILED testcases/test_delete.py::TestDeleteString::test_delete_entity_loop_with_string

NicoYuan1986 avatar Jul 17 '23 02:07 NicoYuan1986

https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-25646/1/pipeline/202 https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-25646/3/pipeline/167

NicoYuan1986 avatar Jul 17 '23 08:07 NicoYuan1986

reproduce : https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/255/pipeline/188

NicoYuan1986 avatar Jul 18 '23 02:07 NicoYuan1986

mark as high priority. This may block the release of 2.2.12

xiaofan-luan avatar Jul 19 '23 02:07 xiaofan-luan

@NicoYuan1986 could you add a logic to retry 10 times, each times wait for 100ms, and check if this is issue last forever or only one time?

xiaofan-luan avatar Jul 19 '23 02:07 xiaofan-luan

The error cannot reproduce with the logic locally. I added the case in CI. #25742

NicoYuan1986 avatar Jul 19 '23 06:07 NicoYuan1986

@NicoYuan1986 could you add a logic to retry 10 times, each times wait for 100ms, and check if this is issue last forever or only one time?

According to https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-25742/1/pipeline/167, retry 10 times, the issue arises only one time.

time:

[2023-07-19T07:09:02.302Z] [2023-07-19 07:07:30 - DEBUG - ci_test]: (api_request)  : [Collection.query] args: ['int64 in [442957853345475231, 442957853345475232, 442957853345475233, 442957853345475234, 442957853345475235, 442957853345475236, 442957853345475237, 442957853345475238, 442957853345475239, 442957853345475240, 442957853345475241, 442957853345475242, 442957853345475243, 442957853345475244, 44295785......, kwargs: {} (api_request.py:56)
[2023-07-19T07:09:02.302Z] [2023-07-19 07:07:31 - DEBUG - ci_test]: (api_response) : [{'int64': 442957853345475871}, {'int64': 442957853345475875}, {'int64': 442957853345475876}, {'int64': 442957853345475878}, {'int64': 442957853345475880}, {'int64': 442957853345475881}, {'int64': 442957853345475885}, {'int64': 442957853345475886}, {'int64': 442957853345475888}, {'int64': 4429578533......  (api_request.py:31)

NicoYuan1986 avatar Jul 19 '23 07:07 NicoYuan1986

Found 2 problems here.

  1. delete msgs' ts are not sorted and this will lead to query error. Fixed in this pr: #25757
  2. sealed segment are retrieved before all delete msgs are consumed. Still have no idea.

sunby avatar Jul 19 '23 09:07 sunby

related pr:

SimFG avatar Jul 19 '23 09:07 SimFG

Found a potential problem, delete msg has a lot of duplication.

k represents a segment, len represents the executed delete msg, and set_len represents the deduplicated aead57db-ec6e-4cdf-ad76-11dea8461440 a3aceb87-84c0-43bb-b32e-45d6704a68e2

SimFG avatar Jul 20 '23 12:07 SimFG

keep reproducing: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.2.0/259/pipeline/188

NicoYuan1986 avatar Jul 21 '23 03:07 NicoYuan1986

after the discussion, we believe #25849 will fix the issue

xiaofan-luan avatar Jul 22 '23 10:07 xiaofan-luan

@NicoYuan1986 does this reproduce recently?

yanliang567 avatar Jul 28 '23 01:07 yanliang567

@NicoYuan1986 does this reproduce recently?

The error has not reproduced since #25849 was merged.

NicoYuan1986 avatar Jul 28 '23 03:07 NicoYuan1986