vminsert upgraded to v1.101.0-cluster, CPU usage increased from 35% to 100%
Describe the bug
after upgrading vminsert to version v1.101.0-cluster, under the same conditions, the CPU usage has increased from 35% to 100%.
before the upgrade, the CPU usage was as follows:
after upgrading to v1.101.0-cluster version, the CPU usage is as follows:
by analyzing pprof, it is suspected that insert_ctx_pool is the cause. Roll insert_ctx_pool back to v1.100.0-cluster, recompile vminsert(v1.101.0-cluster), and CPU usage returns to normal.
repeated testing many times, the CPU usage will increase after upgrading to v1.101.0. Only rolling insert_ctx_pool back to v1.100.0 solves the problem.
should insert_ctx_pool be rolled back to v1.100.0 ?
To Reproduce
upgrading vminsert to version v1.101.0-cluster and observe changes in cpu usage.
Version
v1.101.0-cluster
Logs
No response
Screenshots
No response
Used command-line flags
No response
Additional information
No response
On 1.102.0 we have the same issue
I'm linking the related commit as it's not mentioned in CHANGELOG.
https://github.com/VictoriaMetrics/VictoriaMetrics/commit/498fe1cfa523be5bfecaa372293c3cded85e75ab
@aluode99 could you provide the pprof result?
@aluode99 would be also great if you could provide resource requests/limits for your vminsert components.
@aluode99 would be also great if you could provide resource requests/limits for your vminsert components.
@hagen1778 vminsert resources are as follows:
replicas: 6
resource requests: 7c6G
resource limits : 7c6G
Datapoints ingestion rate: 3.5 Mil
I'm linking the related commit as it's not mentioned in CHANGELOG.
@aluode99 could you provide the pprof result? @jiekun Sorry, I didn't save pprof. I can provide monitoring data if needed.
Trying to reproduce it locally with very simple setup. And here's my profile for v1.100.0/v1.100.0-without-ch/v1.101.0.
profile.zip
I did not observe a significant difference in CPU usage (I did not scrape the precise metrics), but I noticed some differences in these profiles in terms of Total Samples, which may indicate the difference of CPU usage:
- v1.100.0: ~100%
- v1.101.0 / v1.100.0-without-ch: ~200%
I may need to setup a test env and re-test it.
Hello! Tell me, please, what is the status of issue now? This blocks us from updating to the latest version. Is there anything I can do to help? Thanks!
@Sinketsu @aluode99 Hi. I'm running the related version of vminsert in our internal cluster to reproduce the issue.
It would be helpful if you could provide the monitor dashboard under vminsert (including: Requests rate, Concurrent inserts, CPU usage, Memory usage, Storage connection saturation, Storage reachability, Network usage: clients, Network usage: vmstorage, Row per insert) when running v1.101.0 (or 1.102.0).
Also, please try to capture the cpu profile.
Hello!
We have independent clusters of vminsert in different AZ. Different AZ have exactly the same load. So, I deploy 1.97.3 to one AZ and 1.102.0 to another to compare in real time.
Graphs per one instance of each cluster (all other instances are similiar):
pprofs: pprof.zip
@Sinketsu Thank you for the support and feedback. I've also observed similar issues during testing. We are discussing internally and will continue to update progress on this issue.
Hello, can you please try to provide GOGC=100 env variable to vminsert?
By default, VictoriaMetrics uses GOGC=30 and it seems, that It could make sync.Pool inefficient for some cases.
I set GOGC=100 on 1.102.0 version.
There are CPU/Mem metrics
Cpu has become better (I think it is similiar with 1.97.3). But now memory was increase)
The change was reverted to the state before v1.101.0 in this PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6794