milvus
milvus copied to clipboard
[Bug]: Scalar queries burn CPU in milvus::query::ExecExprVisitor::ExecRangeVisitorImpl()
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: 2.2.3
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): Pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.0
- OS(Ubuntu or CentOS): FCOS
- CPU/Memory: 48 Xeon cores, 768G RAM on host
- GPU: n/a
- Others:
Current Behavior
We're doing many simple scalar queries with expr: field_name in [{field_values}]
and find our querynodes hitting their CPU limit in milvus::query::ExecExprVisitor::ExecRangeVisitorImpl()
(established using perf
)
The collection has > 100M rows and growing. Query performance gets worse as the collection grows. We've added resources to the querynodes repeatedly but are approaching the end of what we can do.
It looks like there is an O(n) cost factor involved, where n is the number of rows in the collection, not the number of rows that satisfy the condition!
Expected Behavior
The cost of a scalar query should not depend on the overall rows in the collection.
Steps To Reproduce
1. Create a collection with many rows
2. Run query with `in` condition and run profiler such as perf.
3. Insert many more rows and repeat, observe degrading performance.
Milvus Log
No response
Anything else?
No response
Hi @akevdmeer, thank you for the discovery, This discovery is very important for us to improve performance. Can you upload the perf sampling data?
I thought that might be due to we make a termnode for each in condition. Which means if in[1,2,3,4] you will filter 4 times
/assign @xiaofan-luan i think this is what we are trying to improve in query.
/unassign
/assign @xiaofan-luan i think this is what we are trying to improve in query.
/unassign
With velox support that should be fixed in 2.4
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
/assign @zhagnlu pls take a look into it
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
keep it and assign to @zhagnlu
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
This is quite nasty issue that makes query
statement unusable against big collection.
Could you please reopen it ?
@izapolsk: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
@xiaofan-luan: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
any updates? @zhagnlu
will using dynamic simd to improve performance for this situation like field_name in [{field_values}], performance improve show as below:
will using dynamic simd to improve performance for this situation like field_name in [{field_values}], performance improve show as below:
@zhagnlu could you please list which pr has this improvement?
Haven't complete totally, this test is in UT. I will pull request tomorrow or next week.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
@zhagnlu any updates
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.