keep-core
keep-core copied to clipboard
Threshold Network Monitoring: Improve nodes removal on failed discovery
Currently, the node discovery process removes nodes from the output file immediately after a metrics port couldn't be reached. We want to let the nodes remain in the file so Prometheus can still try to reach them in case the node is back up before the next discovery is executed.
We may want to remove nodes from the list only after discovery failures happen continuously in multiple subsequent discovery iterations.