tiup icon indicating copy to clipboard operation
tiup copied to clipboard

Enable collector.ethtool in node_exporter

Open borissavelev opened this issue 1 year ago • 4 comments

Feature Request

Is your feature request related to a problem? Please describe: In cloud environments, it is common to encounter network limitations that can affect the performance and stability of the TiDB setup. Many users, including myself, have experienced network bottlenecks that can lead to issues like "bw_in_allowance_exceeded" and "bw_out_allowance_exceeded." This is especially prevalent when running TiDB on platforms like AWS, where network performance is closely monitored and has specific limits.

Example:

sudo ethtool -S eth0 | grep -i allow
     bw_in_allowance_exceeded: 29434181
     bw_out_allowance_exceeded: 129

Describe the feature you'd like: I kindly request an update to the node_exporter component within the TiUP tool. Specifically, I propose enabling the "ethtool" collector in node_exporter. At the moment, the current version of node_exporter does not support this essential feature.

Why the featue is needed:

Describe alternatives you've considered: By adding support for the ethtool collector in node_exporter, TiDB users will be able to obtain more relevant and detailed metrics regarding their cloud network usage. This enhancement will allow us to monitor key network performance parameters, helping to identify and address potential bottlenecks early on. Additionally, it will be particularly beneficial for users running TiDB in cloud environments, where network limitations can significantly impact the overall performance of the system.

I believe that this enhancement will greatly improve the monitoring capabilities of TiDB in cloud setups and will be highly appreciated by the community. As more and more users are adopting TiDB for cloud deployments, this feature will become increasingly valuable.

Teachability, Documentation, Adoption, Migration Strategy: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html

borissavelev avatar Jul 21 '23 10:07 borissavelev

I'd also consider adding a new relevant panels (or dashboards) for metrics from ethtool collector and alert rules as well.

borissavelev avatar Jul 21 '23 10:07 borissavelev

This is very similar to #1744 but for a different component

dveeden avatar Jul 21 '23 13:07 dveeden

Here are the hardcoded collectors:

https://github.com/pingcap/tiup/blob/041760dc4913d251287fb8578b879ba454f59515/embed/templates/scripts/run_node_exporter.sh.tpl#L23-L29

What's requested here is the ability to add this:

--collector.ethtool.device-include
--collector.ethtool.metrics-include

dveeden avatar Jul 21 '23 14:07 dveeden

@dveeden let's exclude the upgrade from this issue. Enabling the collector doesn’t require a node_exporter upgrade

borissavelev avatar Oct 23 '23 08:10 borissavelev