Size estimation is often wrong

Open dveeden opened this issue 3 months ago • 1 comments

Bug Report

The output of Estimated size of data to collect is often way off. The result is that it is difficult to check if a clinic diag is going to be over the maximum size or not.

This leads to:

Files that are over the 3GiB limit. Which requires another run with a shorter time span or more filtering
Clinic that have more filtering done or a too short period to make the estimation lower than 3 GiB, but then turn out to be way under the limit. This makes the diag info less usable then it could be.
The 3 GiB limit is hardcoded in the check and messages. But the actual clinic service might have different limits per customer.

Sep 01 '25 14:09 dveeden

The prometheus metric dump size is estimated by an empirical calculation https://github.com/pingcap/diag/blob/ea49e727a046a3b320a047af814a4fc4b5e6f768/collector/prometheus.go#L269 which came from few test runs on several random testing clusters some time like 3 years ago. It's known to be inaccurate but I didn't find a better (and yet simple enough) way to improve it at that time. I think on latest versions it could be more inaccurate due to metrics changes.

Maybe a possible approach is to test metrics size after compression for different components and major versions, then add them together, I'm not sure about this.

Sep 02 '25 06:09 AstroProfundis