tiup
tiup copied to clipboard
Enable collector.ethtool in node_exporter
Feature Request
Is your feature request related to a problem? Please describe: In cloud environments, it is common to encounter network limitations that can affect the performance and stability of the TiDB setup. Many users, including myself, have experienced network bottlenecks that can lead to issues like "bw_in_allowance_exceeded" and "bw_out_allowance_exceeded." This is especially prevalent when running TiDB on platforms like AWS, where network performance is closely monitored and has specific limits.
Example:
sudo ethtool -S eth0 | grep -i allow
bw_in_allowance_exceeded: 29434181
bw_out_allowance_exceeded: 129
Describe the feature you'd like: I kindly request an update to the node_exporter component within the TiUP tool. Specifically, I propose enabling the "ethtool" collector in node_exporter. At the moment, the current version of node_exporter does not support this essential feature.
Why the featue is needed:
Describe alternatives you've considered: By adding support for the ethtool collector in node_exporter, TiDB users will be able to obtain more relevant and detailed metrics regarding their cloud network usage. This enhancement will allow us to monitor key network performance parameters, helping to identify and address potential bottlenecks early on. Additionally, it will be particularly beneficial for users running TiDB in cloud environments, where network limitations can significantly impact the overall performance of the system.
I believe that this enhancement will greatly improve the monitoring capabilities of TiDB in cloud setups and will be highly appreciated by the community. As more and more users are adopting TiDB for cloud deployments, this feature will become increasingly valuable.
Teachability, Documentation, Adoption, Migration Strategy: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html
I'd also consider adding a new relevant panels (or dashboards) for metrics from ethtool collector and alert rules as well.
This is very similar to #1744 but for a different component
Here are the hardcoded collectors:
https://github.com/pingcap/tiup/blob/041760dc4913d251287fb8578b879ba454f59515/embed/templates/scripts/run_node_exporter.sh.tpl#L23-L29
What's requested here is the ability to add this:
--collector.ethtool.device-include
--collector.ethtool.metrics-include
@dveeden let's exclude the upgrade from this issue. Enabling the collector doesn’t require a node_exporter upgrade