diag icon indicating copy to clipboard operation
diag copied to clipboard

Size estimation is often wrong

Open dveeden opened this issue 3 months ago • 1 comments

Bug Report

The output of Estimated size of data to collect is often way off. The result is that it is difficult to check if a clinic diag is going to be over the maximum size or not.

This leads to:

  • Files that are over the 3GiB limit. Which requires another run with a shorter time span or more filtering
  • Clinic that have more filtering done or a too short period to make the estimation lower than 3 GiB, but then turn out to be way under the limit. This makes the diag info less usable then it could be.
  • The 3 GiB limit is hardcoded in the check and messages. But the actual clinic service might have different limits per customer.

dveeden avatar Sep 01 '25 14:09 dveeden

The prometheus metric dump size is estimated by an empirical calculation https://github.com/pingcap/diag/blob/ea49e727a046a3b320a047af814a4fc4b5e6f768/collector/prometheus.go#L269 which came from few test runs on several random testing clusters some time like 3 years ago. It's known to be inaccurate but I didn't find a better (and yet simple enough) way to improve it at that time. I think on latest versions it could be more inaccurate due to metrics changes.

Maybe a possible approach is to test metrics size after compression for different components and major versions, then add them together, I'm not sure about this.

AstroProfundis avatar Sep 02 '25 06:09 AstroProfundis