milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: Scalar queries burn CPU in milvus::query::ExecExprVisitor::ExecRangeVisitorImpl()

Open akevdmeer opened this issue 1 year ago • 7 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: 2.2.3
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): Pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.0
- OS(Ubuntu or CentOS): FCOS
- CPU/Memory: 48 Xeon cores, 768G RAM on host
- GPU: n/a
- Others:

Current Behavior

We're doing many simple scalar queries with expr: field_name in [{field_values}] and find our querynodes hitting their CPU limit in milvus::query::ExecExprVisitor::ExecRangeVisitorImpl() (established using perf)

The collection has > 100M rows and growing. Query performance gets worse as the collection grows. We've added resources to the querynodes repeatedly but are approaching the end of what we can do.

It looks like there is an O(n) cost factor involved, where n is the number of rows in the collection, not the number of rows that satisfy the condition!

Expected Behavior

The cost of a scalar query should not depend on the overall rows in the collection.

Steps To Reproduce

1. Create a collection with many rows
2. Run query with `in` condition and run profiler such as perf.
3. Insert many more rows and repeat, observe degrading performance.

Milvus Log

No response

Anything else?

No response

akevdmeer avatar Mar 13 '23 08:03 akevdmeer

Hi @akevdmeer, thank you for the discovery, This discovery is very important for us to improve performance. Can you upload the perf sampling data?

jiaoew1991 avatar Mar 13 '23 09:03 jiaoew1991

I thought that might be due to we make a termnode for each in condition. Which means if in[1,2,3,4] you will filter 4 times

xiaofan-luan avatar Mar 13 '23 11:03 xiaofan-luan

/assign @xiaofan-luan i think this is what we are trying to improve in query.

/unassign

yanliang567 avatar Mar 13 '23 11:03 yanliang567

/assign @xiaofan-luan i think this is what we are trying to improve in query.

/unassign

With velox support that should be fixed in 2.4

xiaofan-luan avatar Mar 13 '23 11:03 xiaofan-luan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Apr 12 '23 12:04 stale[bot]

/assign @zhagnlu pls take a look into it

xiaofan-luan avatar Apr 12 '23 20:04 xiaofan-luan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar May 12 '23 23:05 stale[bot]

keep it and assign to @zhagnlu

xiaofan-luan avatar May 21 '23 02:05 xiaofan-luan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jun 20 '23 05:06 stale[bot]

This is quite nasty issue that makes query statement unusable against big collection. Could you please reopen it ?

izapolsk avatar Jun 27 '23 09:06 izapolsk

@izapolsk: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sre-ci-robot avatar Jun 27 '23 09:06 sre-ci-robot

/reopen

xiaofan-luan avatar Jun 27 '23 09:06 xiaofan-luan

@xiaofan-luan: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sre-ci-robot avatar Jun 27 '23 09:06 sre-ci-robot

any updates? @zhagnlu

xiaofan-luan avatar Jun 27 '23 09:06 xiaofan-luan

will using dynamic simd to improve performance for this situation like field_name in [{field_values}], performance improve show as below: image

zhagnlu avatar Jun 29 '23 08:06 zhagnlu

will using dynamic simd to improve performance for this situation like field_name in [{field_values}], performance improve show as below: image

@zhagnlu could you please list which pr has this improvement?

yanliang567 avatar Jun 29 '23 09:06 yanliang567

Haven't complete totally, this test is in UT. I will pull request tomorrow or next week.

zhagnlu avatar Jun 29 '23 09:06 zhagnlu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jul 29 '23 16:07 stale[bot]

@zhagnlu any updates

yanliang567 avatar Jul 31 '23 01:07 yanliang567

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 07 '23 00:09 stale[bot]