Size estimation is often wrong
Bug Report
The output of Estimated size of data to collect is often way off. The result is that it is difficult to check if a clinic diag is going to be over the maximum size or not.
This leads to:
- Files that are over the 3GiB limit. Which requires another run with a shorter time span or more filtering
- Clinic that have more filtering done or a too short period to make the estimation lower than 3 GiB, but then turn out to be way under the limit. This makes the diag info less usable then it could be.
- The 3 GiB limit is hardcoded in the check and messages. But the actual clinic service might have different limits per customer.
The prometheus metric dump size is estimated by an empirical calculation https://github.com/pingcap/diag/blob/ea49e727a046a3b320a047af814a4fc4b5e6f768/collector/prometheus.go#L269 which came from few test runs on several random testing clusters some time like 3 years ago. It's known to be inaccurate but I didn't find a better (and yet simple enough) way to improve it at that time. I think on latest versions it could be more inaccurate due to metrics changes.
Maybe a possible approach is to test metrics size after compression for different components and major versions, then add them together, I'm not sure about this.