Results 21 comments of petanne

I saw your code and resolve my problem. Thanks a lot.

@valyala How does vmstorage work when handle a request query?

Hi @hagen1778! Thanks for reply! One vmstorage process on each physical machine. There are no any other heavy process. Red is problematic time range. ![image](https://user-images.githubusercontent.com/4476119/205259274-3ee41766-9821-4530-8c52-0fbb4cf1a8ef.png) Some metrics from node_exporter. ![image](https://user-images.githubusercontent.com/4476119/205260537-7ce51084-8a3f-4003-982e-f5901ebc4a4d.png)...

> Do you have the following panels on your dashboard? Red is problematic time range. ![image](https://user-images.githubusercontent.com/4476119/205279926-46425fce-88dd-45cc-b7e3-da2021615d9a.png) Another test case, i try increase `-search.maxConcurrentRequests` for vmselect, problem still occurs. ![image](https://user-images.githubusercontent.com/4476119/205283596-bc9a5f4d-18dd-44c3-9ef6-63a086a04f29.png) I...

We try set `-search.maxUniqueTimeseries` for vmstorage to 100000, problem still occurs.

> More times or just takes more time? In problematic time range, `lib/storage.(*TSID).Less` is 30% flat in the problematic vmstorage, but other vmstorage is 5%. This is cache usage %...

@valyala, thanks! It worked! The `TSID.Less()` flat in the CPU pprof is 3.87% now, it was 5% when normal before. ![image](https://user-images.githubusercontent.com/4476119/205626003-81bd18c2-6c7b-4137-935c-f37720505f74.png) Expecting for release. This time the sudden problem occurred...

@valyala @hagen1778 Bad news, problem happened again, and other problems arised. When we do stress tests and outage drills. ### Version vmstorage: [8e9822b](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/8e9822bc7f0f0591555be4faa76dd5af431e2000) vmagent: [v1.84.0-cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.84.0-cluster) vminsert: [v1.84.0-cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.84.0-cluster) vmselect: [v1.84.0-cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.84.0-cluster) vmalert:...

> Does pprof shows the same issue? Sorry for not keeping the scene and save pprof. But I have another pprof with the same problem for quesion 6. [13.cpu.pprof.12.6.zip](https://github.com/VictoriaMetrics/VictoriaMetrics/files/10193915/13.cpu.pprof.12.6.zip) ![image](https://user-images.githubusercontent.com/4476119/206686066-83c913e3-efc0-411c-9acc-cf9c127e25f8.png)...

### New test case Only start one vmalert with `-evaluationInterval=10s`, 13.vmstorage CPU is 90%. `storage.(*TSID).Less` called from `storage.(*partSearch).nextBHS` is biggest CPU usage again. [13.error.cpu.pprof.12.9.zip](https://github.com/VictoriaMetrics/VictoriaMetrics/files/10194280/13.error.cpu.pprof.12.9.zip) ``` github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partSearch).nextBHS /root/VictoriaMetrics/lib/storage/tsid.go Total: 177.22s 178.12s...