perf: fix for cluster aggregation performance
The changes include performance improvements for cluster mode aggregations flow. The changes are 3 fold -
- The hashing of each metric is moved to workers. Now each worker hashes the data and adds that in a key named hash in the metrics json.
- This is distributing the hashing task among workers rather than master doing the CPU task
- The Map which was being built on every request freshly, is made global and only if a new metric comes then that will be put in the map else we will get the value and push the item in the array.
- The insertion of large keys is very slow in Map as it is maintaining insertion order and this will minimize the insertions
- A major choking point was when master asks workers to provide metrics and workers sending those metrics over IPC causing the choking in IPC which was hindering the request routing from master to workers. The change here is that workers don't send metrics over IPC rather they are writing them in a file and sending master the file name over IPC. Master is then reading the metrics from the files and deleting the files.
- This is removing the congestion on IPC and requests routing from master to workers not getting hampered.
@SimenB @siimon kindly approve for workflows. I have added the prettier fixes because of which the earlier workflows failed.
Any concern in filesystem anymore? Should we consider to release it?
Any concern in filesystem anymore? Should we consider to release it?
@mrnonz we haven't faced any issues till now related to filesystem. This change would require the users to provide the access to /tmp path for prom-client
@BourgoisMickael kindly approve the workflows for the suggested changes
So after staring at this code for the better part of a week, and spending a good bit of time at my previous job trying to write something like this for opentelemetry, these are my thoughts on both this PR and the code:
- If process.send() is not fast enough, the solution is not going through the filesystem, it's handing over a socket via
send()and streaming the data. Neither Prometheus nor Opentelemetry are built on reliable delivery, but IPC through the filesystem is considered one of the classic blunders. Filesystems are at least 10x more insane than you can possibly guess. - Rather than sending the hashObject() response back over the link, we would get some utility from sorting the keys on the sending end. Right now the code fights this in a few places (there are tests that insist that labels and default labels should merge in a specific order that is definitely not sorted).
- I have made some PRs that improve the performance of the label processing and Grouper, which should help a bit on both ends of the connection. I have another PR almost ready and one more after that, which should modify the ratios that are driving this PR. It would be worth someone with a larger corpus than I to weigh in on what things look like now.
- It should concern all parties involved, how low the code coverage is for lib/cluster.js
https://github.com/siimon/prom-client/pull/692 answers most of these concerns.
The main thing I have not done is recycling the per-stat collectors between aggregation calls. There are functions which support it, but I'm not calling them at present. That can be done as a follow-on.