prometheus-engine icon indicating copy to clipboard operation
prometheus-engine copied to clipboard

Improved scrape configuration debuggability.

Open bwplotka opened this issue 1 year ago • 1 comments

One common misconfiguration on GMP is a mismatched port/port name on PodMonitoring.

Unfortunately this issue surfaces nowhere other than your metric is not scraped and not on GCM. It causes users (I did that to myself a few times too) to debug all the stages of collection, which is even more difficult without collector UI/metrics access. And even WITH the access you get confusing scrape config and your discovered pods are dropped without any additional info, due to port mismatch.

AC

  • User is notified that the port or pod selector was mismatched

Ideas:

  • We could e.g. do some extra work on operator when target status is enabled to scan problematic pod that does not have ANY scrapes and list and check the ports?
  • I wonder if SD API in Prometheus would support some feature like relabelling messaging, where if the rule is performed and dropped the metric you could provide custom message. Then we could surface that to users in Target Status feature 🤔

bwplotka avatar Dec 06 '23 16:12 bwplotka

Another idea is a CLI tool that will validate different theories per specific PodMonitoring

bwplotka avatar Dec 06 '23 16:12 bwplotka