milvus
milvus copied to clipboard
[Bug]: CPU usage fluctuates from 50% to 500% when the collections are loaded and no action is executed
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: 2.2.6 and 2.2.3 (both are tried)
- Deployment mode(standalone or cluster): both standalone and cluster
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
I created several collections as follows:
- create collection
- create index
- insert all data
- flush
- compact
It is noted that some collections have 10,00+ entities while some have 0 entities (they are still indexed). Then, I loaded all collections and did no further action, such as insert, search or any other actions. I observed that the CPU usage were fluctuating from 50% to 500% periodically. I inspected into the milvus and found that dataCoord and indexCoord are utilizing CPUs periodically.
Expected Behavior
The CPU usage is kept at a low level like 50% in average.
Steps To Reproduce
No response
Milvus Log
indexCoord has logs: [2023/04/26 07:53:56.921 +00:00] [INFO] [indexcoord/index_coord.go:845] ["IndexCoord DescribeIndex"] [collectionID=441032337568525823] [indexName=] [2023/04/26 07:53:56.929 +00:00] [INFO] [indexcoord/index_coord.go:612] ["IndexCoord completeIndexInfo"] [collID=441032337568525823] [indexName=_default_idx_101] [2023/04/26 07:53:56.930 +00:00] [DEBUG] [indexcoord/meta_table.go:659] ["IndexCoord get index states success"] [indexID=441032337568728903] [total=296] [None=0] [Unissued=0] [InProgress=0] [Finished=296] [Failed=0] [2023/04/26 07:53:56.930 +00:00] [DEBUG] [indexcoord/meta_table.go:736] ["IndexCoord get index states success"] [indexID=441032337568728903] [indexRows=52791] [2023/04/26 07:53:56.930 +00:00] [INFO] [indexcoord/index_coord.go:635] ["IndexCoord completeIndexInfo success"] [collID=441032337568525823] [totalRows=52791] [indexRows=52791] [state=Finished] [failReason=] [2023/04/26 07:53:56.930 +00:00] [INFO] [indexcoord/index_coord.go:933] ["IndexCoord describe index success"] [collectionID=441032337568525823] [indexID=441032337568728903] ["total rows"=52791] ["index rows"=52791] ["index state"=Finished] [segments="[441032337568807735,441032337568791484,441032337568779307,441032337568798936,441032337568800036,441032337568797318,441032337568792170,441032337568793253,441032337568810350,441032337568804256,441032337568783286,441032337568785318,441032337568781771,441032337568808990,441032337568788695,441032337568804730,441032337568812383,441032337568812384,441032337568784728,441032337568799523,441032337568792079,441032337568797577,441032337568781097,441032337568795206,441032337568800959,441032337568794870,441032337568780575,441032337568794594,441032337568805439,441032337568802308,441032337568788787,441032337568783773,441032337568812475,441032337568790565,441032337568803665,441032337568796138,441032337568795542,441032337568782475,441032337568808145,441032337568786435,441032337568804729,441032337568787169,441032337568792754,441032337568800872,441032337568781338,441032337568798844,441032337568779993,441032337568781931,441032337568783774,441032337568799524,441032337568793880,441032337568799609,441032337568797319,441032337568799323,441032337568800873,441032337568810442,441032337568805020,441032337568788110,441032337568782022,441032337568790729,441032337568796818,441032337568784050,441032337568796910,441032337568806959,441032337568790813,441032337568801641,441032337568809668,441032337568796909,441032337568807447,441032337568805608,441032337568807641,441032337568807446,441032337568787170,441032337568786084,441032337568804935,441032337568789150,441032337568797495,441032337568794199,441032337568798843,441032337568796634,441032337568805701,441032337568780455,441032337568790048,441032337568806285,441032337568790049,441032337568811119,441032337568782699,441032337568795461,441032337568808410,441032337568801346,441032337568779902,441032337568781770,441032337568785991,441032337568802309,441032337568788019,441032337568807734,441032337568780456,441032337568802995,441032337568784729,441032337568796225,441032337568809075,441032337568811707,441032337568810840,441032337568797994,441032337568798166,441032337568786758,441032337568808781,441032337568785317,441032337568812476,441032337568785143,441032337568782607,441032337568791222,441032337568794108,441032337568797995,441032337568796137,441032337568785409,441032337568801549,441032337568788020,441032337568788109,441032337568784463,441032337568781255,441032337568782700,441032337568806960,441032337568791887,441032337568800287,441032337568799324,441032337568789374,441032337568798249,441032337568810128,441032337568792846,441032337568787344,441032337568797576,441032337568803582,441032337568779306,441032337568793879,441032337568812197,441032337568783111,441032337568804255,441032337568780659,441032337568801997,441032337568785142,441032337568786083,441032337568800195,441032337568792755,441032337568803380,441032337568794593,441032337568802225,441032337568787844,441032337568791221,441032337568808147,441032337568793519,441032337568796635,441032337568807049,441032337568811799,441032337568789864,441032337568801550,441032337568789865,441032337568811800,441032337568801345,441032337568783373,441032337568793434,441032337568795543,441032337568809074,441032337568795930,441032337568794788,441032337568784051,441032337568801642,441032337568808780,441032337568804936,441032337568806784,441032337568804348,441032337568785992,441032337568787435,441032337568812196,441032337568802994,441032337568806284,441032337568811490,441032337568800649,441032337568782608,441032337568791401,441032337568802903,441032337568784636,441032337568787845,441032337568803583,441032337568788554,441032337568790730,441032337568804347,441032337568794107,441032337568794200,441032337568779756,441032337568782023,441032337568802226,441032337568783960,441032337568800648,441032337568811030,441032337568798167,441032337568806073,441032337568779994,441032337568806785,441032337568806369,441032337568793433,441032337568798631,441032337568782474,441032337568792563,441032337568783285,441032337568792171,441032337568811706,441032337568791888,441032337568811031,441032337568786436,441032337568780576,441032337568808991,441032337568779901,441032337568803664,441032337568787345,441032337568790812,441032337568797494,441032337568783112,441032337568784462,441032337568796817,441032337568810129,441032337568811120,441032337568786667,441032337568800037,441032337568798632,441032337568798935,441032337568802902,441032337568809482,441032337568779755,441032337568786666,441032337568793520,441032337568798248,441032337568810443,441032337568792847,441032337568802667,441032337568806368,441032337568788696,441032337568803379,441032337568804034,441032337568807642,441032337568794872,441032337568801998,441032337568781930,441032337568802668,441032337568805438,441032337568789151,441032337568800194,441032337568805700,441032337568795931,441032337568792562,441032337568809758,441032337568804033,441032337568805019,441032337568789456,441032337568780658,441032337568790566,441032337568810349,441032337568787436,441032337568793252,441032337568788555,441032337568784637,441032337568792078,441032337568785837,441032337568791402,441032337568795460,441032337568808411,441032337568790140,441032337568811489,441032337568805609,441032337568783374,441032337568790141,441032337568789375,441032337568808318,441032337568781098,441032337568783961,441032337568789457,441032337568785408,441032337568791483,441032337568781254,441032337568800288,441032337568781337,441032337568809667,441032337568799610,441032337568809483,441032337568788788,441032337568795207,441032337568796226,441032337568785836,441032337568786759,441032337568808319,441032337568800958,441032337568807048,441032337568810841,441032337568809759,441032337568806072,441032337568794789]"] [2023/04/26 07:53:56.931 +00:00] [INFO] [indexcoord/index_coord.go:845] ["IndexCoord DescribeIndex"] [collectionID=441032337568525823] [indexName=] [2023/04/26 07:53:56.938 +00:00] [INFO] [indexcoord/index_coord.go:612] ["IndexCoord completeIndexInfo"] [collID=441032337568525823] [indexName=_default_idx_101] [2023/04/26 07:53:56.938 +00:00] [DEBUG] [indexcoord/meta_table.go:659] ["IndexCoord get index states success"] [indexID=441032337568728903] [total=296] [None=0] [Unissued=0] [InProgress=0] [Finished=296] [Failed=0] [2023/04/26 07:53:56.938 +00:00] [DEBUG] [indexcoord/meta_table.go:736] ["IndexCoord get index states success"] [indexID=441032337568728903] [indexRows=52791] [2023/04/26 07:53:56.938 +00:00] [INFO] [indexcoord/index_coord.go:635] ["IndexCoord completeIndexInfo success"] [collID=441032337568525823] [totalRows=52791] [indexRows=52791] [state=Finished] [failReason=]
dataCoord has logs: [2023/04/26 08:11:50.926 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568786666] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.926 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568789375] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.926 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568794199] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.926 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568785408] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.926 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568809075] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.926 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568800037] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568803664] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568804034] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568788554] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568779756] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568798843] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568796817] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568789865] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568782023] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568796909] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568806073] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568799324] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.927 +00:00] [DEBUG] [datanode/channel_meta.go:701] ["getChannelCheckpoint for segment"] [segmentID=441032337568799523] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615] [2023/04/26 08:11:50.928 +00:00] [INFO] [datanode/flow_graph_time_tick_node.go:115] ["UpdateChannelCheckpoint success"] [channel=by-dev-rootcoord-dml_6_441032337568525823v0] [cpTs=441056417691205634] [cpTime=2023/04/26 08:11:50.553 +00:00] [2023/04/26 08:11:50.930 +00:00] [INFO] [datanode/flow_graph_time_tick_node.go:115] ["UpdateChannelCheckpoint success"] [channel=by-dev-rootcoord-dml_7_441032337568525823v1] [cpTs=441056417691205634] [cpTime=2023/04/26 08:11:50.553 +00:00] [2023/04/26 08:11:51.323 +00:00] [DEBUG] [datanode/flow_graph_insert_buffer_node.go:90] ["IBN timetick log"] [from=2023/04/26 08:08:28.202 +00:00] [to=2023/04/26 08:11:51.002 +00:00] [elapsed=3m22.8s] [start=441056364646105089] [end=441056417808908289] [vChannelName=by-dev-rootcoord-dml_1_441032337568306045v1] [2023/04/26 08:11:51.323 +00:00] [DEBUG] [datanode/flow_graph_insert_buffer_node.go:90] ["IBN timetick log"] [from=2023/04/26 08:08:28.202 +00:00] [to=2023/04/26 08:11:51.002 +00:00] [elapsed=3m22.8s] [start=441056364646105089] [end=441056417808908289] [vChannelName=by-dev-rootcoord-dml_0_441032337568306045v0] [2023/04/26 08:11:55.522 +00:00] [DEBUG] [datanode/flow_graph_dd_node.go:348] ["DDNode sent delta timeTick"] [collectionID=441032337568306045] [ts=441056418896805889] [ts_p=2023/04/26 08:11:55.152 +00:00] [channel=by-dev-rootcoord-dml_0_441032337568306045v0]
Anything else?
No response
@danielwonght thank you for the issue. do you happen to have any metrics screenshots of grafana? I guess there were some creating index tasks running at that moment. could you please check the cup usage again after a few minutes(depends on how many entities did you insert)
/assign @danielwonght
CPU usage is expected. Building index cost a lot of CPUs, compact small segments also do. You can try:
- use IVF Flat Index, which is much cheaper on index build.
- use a separate node for index build, search and data.
How many collections are we targeting?
@danielwonght thank you for the issue. do you happen to have any metrics screenshots of grafana? I guess there were some creating index tasks running at that moment. could you please check the cup usage again after a few minutes(depends on how many entities did you insert)
/assign @danielwonght
@yanliang567 Well, I am not using K8s to deploy the Milvus service, so it should take time to build the grafana for watching the performance of the Milvus service. The only thing I can provide now is the docker stats: a284405bb383 milvus-indexcoord 255.80% 63.86MiB / 58.88GiB 0.11% 317GB / 155GB 5.39MB / 4.1kB 18 21ed66b2d254 milvus-datacoord 224.14% 70.73MiB / 58.88GiB 0.12% 160GB / 323GB 4.39MB / 4.1kB 18
My fault. The log for datacoord is actually like this: [2023/04/27 06:37:04.651 +00:00] [INFO] [datacoord/handler.go:255] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_6_441032337568525823v0] [posTs=441077570785247234] [posTime=2023/04/27 06:36:43.203 +00:00] [2023/04/27 06:37:04.651 +00:00] [INFO] [datacoord/services.go:679] ["datacoord append channelInfo in GetRecoveryInfo"] [collectionID=441032337568525823] [partitionID=441032337568526459] [channelInfo="collectionID:441032337568525823 channelName:"by-dev-rootcoord-dml_6_441032337568525823v0" seek_position:<channel_name:"by-dev-rootcoord-dml_6_441032337568525823v0" msgID:"\010\345\t\020\300\025\030\000 \000" msgGroup:"by-dev-dataNode-8-by-dev-rootcoord-dml_6_441032337568525823v0" timestamp:441077570785247234 > flushedSegmentIds:441032337568801346 "] [2023/04/27 06:37:04.659 +00:00] [INFO] [datacoord/handler.go:118] [GetQueryVChanPositions] [collectionID=441032337568525823] [channel=by-dev-rootcoord-dml_7_441032337568525823v1] [numOfSegments=148] [2023/04/27 06:37:04.659 +00:00] [INFO] [datacoord/handler.go:255] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_7_441032337568525823v1] [posTs=441077570785247234] [posTime=2023/04/27 06:36:43.203 +00:00] [2023/04/27 06:37:04.659 +00:00] [INFO] [datacoord/services.go:679] ["datacoord append channelInfo in GetRecoveryInfo"] [collectionID=441032337568525823] [partitionID=441032337568526459] [channelInfo="collectionID:441032337568525823 channelName:"by-dev-rootcoord-dml_7_441032337568525823v1" seek_position:<channel_name:"by-dev-rootcoord-dml_7_441032337568525823v1" msgID:"\010\344\t\020\300\025\030\000 \000" msgGroup:"by-dev-dataNode-8-by-dev-rootcoord-dml_7_441032337568525823v1" timestamp:441077570785247234 > flushedSegmentIds:441032337568796635 "] [2023/04/27 06:37:04.659 +00:00] [INFO] [datacoord/services.go:644] ["get recovery info request received"] [collectionID=441032337568525823] [partitionID=441032337568526463] [2023/04/27 06:37:04.667 +00:00] [INFO] [datacoord/handler.go:118] [GetQueryVChanPositions] [collectionID=441032337568525823] [channel=by-dev-rootcoord-dml_6_441032337568525823v0] [numOfSegments=148] [2023/04/27 06:37:04.667 +00:00] [INFO] [datacoord/handler.go:255] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_6_441032337568525823v0] [posTs=441077570785247234] [posTime=2023/04/27 06:36:43.203 +00:00] [2023/04/27 06:37:04.667 +00:00] [INFO] [datacoord/services.go:679] ["datacoord append channelInfo in GetRecoveryInfo"] [collectionID=441032337568525823] [partitionID=441032337568526463] [channelInfo="collectionID:441032337568525823 channelName:"by-dev-rootcoord-dml_6_441032337568525823v0" seek_position:<channel_name:"by-dev-rootcoord-dml_6_441032337568525823v0" msgID:"\010\345\t\020\300\025\030\000 \000" msgGroup:"by-dev-dataNode-8-by-dev-rootcoord-dml_6_441032337568525823v0" timestamp:441077570785247234 > flushedSegmentIds:441032337568783960 "] [2023/04/27 06:37:04.680 +00:00] [INFO] [datacoord/handler.go:118] [GetQueryVChanPositions] [collectionID=441032337568525823] [channel=by-dev-rootcoord-dml_7_441032337568525823v1] [numOfSegments=148] [2023/04/27 06:37:04.680 +00:00] [INFO] [datacoord/handler.go:255] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_7_441032337568525823v1] [posTs=441077570785247234] [posTime=2023/04/27 06:36:43.203 +00:00] [2023/04/27 06:37:04.680 +00:00] [INFO] [datacoord/services.go:679] ["datacoord append channelInfo in GetRecoveryInfo"] [collectionID=441032337568525823] [partitionID=441032337568526463] [channelInfo="collectionID:441032337568525823 channelName:"by-dev-rootcoord-dml_7_441032337568525823v1" seek_position:<channel_name:"by-dev-rootcoord-dml_7_441032337568525823v1" msgID:"\010\344\t\020\300\025\030\000 \000" msgGroup:"by-dev-dataNode-8-by-dev-rootcoord-dml_7_441032337568525823v1" timestamp:441077570785247234 > flushedSegmentIds:441032337568780659 "]
The datacoord and indexcoord have almost the same trend on the CPU usage. And meanwhile, I find that, if I release all collections, the CPU usage of datacoord and indexcoord are kept at an acceptable level. When I load an empty collection, for example, just creating a collection and building index without inserting any entity, the CPU usage also behave as expected. When I load a collection having 40000+ entities, then the CPU usage come to behave like what I described above, keeping fluctuating in a specific range.
@xiaofan-luan Yes, the index building needs a lot of resources. But, the point is that I did build index in advance. It seems that even though the collections are well built in advance, the datacoord and indexcoord will still keep doing something after the collection is loaded to memory cache, which costs some CPU resources.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.