tidb-dashboard icon indicating copy to clipboard operation
tidb-dashboard copied to clipboard

supports QPS/CPU dimension in keyviz

Open nolouch opened this issue 3 years ago • 4 comments
trafficstars

Feature Request

Is your feature request related to a problem? Please describe: Currently,Keyviz has keys and bytes dimensions. In some situations, it cannot find the hotspot that the CPU is high caused by QPS, the keys and bytes may be 0. For example:

create table t(id int);
insert into table t values (1)(2)....(1000);
// random read out bound,suppose there are many requests.
select * from t where id = 1000 + rand(0,1000)

Describe the feature you'd like:

  • add QPS dimension
  • add CPU dimension image

Describe alternatives you've considered:

we can add QPS first. CPU after the TopSQL enable by default.

Teachability, Documentation, Adoption, Migration Strategy:

  • Go

nolouch avatar Apr 06 '22 06:04 nolouch

cc @qidi1 @IcePigZDB

nolouch avatar Apr 06 '22 06:04 nolouch

Currently we have undecided about the future iteration of KeyViz, due to its requirement of professional ability. According to the scenario you described, how about simply using Top SQL to observe such hot spots directly?

I tested with the payload you provided. Here is what I got from Top SQL:

image image

Here is what I got from KeyViz (it indeed shows zero read flow):

image

Top SQL looks to be sufficient enough for troubleshooting such case.

breezewish avatar Apr 06 '22 17:04 breezewish

Yes, TopSQL can tell me that this SQL is consuming a lot of CPU. It's useful to the user, but it's not enough for an advanced user or expert. TopSQL cannot say it is a hotspot, the data may access randomly, and there is enough data to be distributed evenly on the TiKV nodes. so this type of SQL does consume a lot of CPU, but it may not cause a hotspot. We still need KeyViz to show the workload pattern to help understand the hotspot. But I agree that it requires professional ability. Here, I think I propose to add the dimension not break the current design. and there are two community people who are interesting in it can help to add it. What do you think about letting them have a try? cc @breeswish

nolouch avatar Apr 11 '22 03:04 nolouch

Thanks for the supply information. I totally agree "We still need KeyViz to show the workload pattern to help understand the hotspot". This looks to be extremely useful for PD developers to understand why such hotspots are not scattered.

there are two community people who are interesting in it can help to add it. What do you think about letting them have a try?

Contributions are always welcome :) We can help reviewing PRs. I'm not sure whether there are CPU usage information in TiKV heartbeat. If such information is available, then the change may be small.

breezewish avatar Apr 11 '22 07:04 breezewish