cloud-pipeline icon indicating copy to clipboard operation
cloud-pipeline copied to clipboard

[GCP Monitoring] Research monitoring API compatibility with GCP Cloud Monitoring API

Open mzueva opened this issue 11 months ago • 1 comments

Background A clear and concise description of what the problem is. Ex. At the moment, Cloud Pipeline does not allow [...] and it would be nice to [...] because [...]

Approach A clear and concise description of what you want to happen.

Other options A clear and concise description of any alternative solutions or features you've considered.

mzueva avatar Apr 29 '25 13:04 mzueva

Summary of GCP Metrics

Google Cloud Platform (GCP) offers a variety of metrics to help you monitor your resources. Here’s a quick overview:

1. How GCP Metrics Are Divided

GCP metrics are grouped by how they’re collected and their domains (prefixes that show their source):

  • Default Metrics (compute.googleapis.com domain): These are built into GCP services like Compute Engine—no extra setup needed! They cover basic stats like CPU usage and network traffic. Check them at Cloud Monitoring Metrics List.
  • Ops Agent Metrics (agent.googleapis.com domain): For deeper insights (e.g., memory, disk usage), need to install the Ops Agent on your VMs. It collects detailed data like memory/bytes_used. Learn more at Ops Agent Metrics and Install Ops Agent.
  • Custom Metrics (custom.googleapis.com domain): Track app-specific data (e.g., user sessions) by sending it to GCP via the Monitoring API. See Custom Metrics Guide.
  • Log-Based Metrics (logging.googleapis.com domain): Create metrics from logs, like counting errors, using Cloud Logging. Explore at Log-Based Metrics.
  • Third-Party Metrics (external.googleapis.com domain): Pull in data from tools like Prometheus using integrations. Details at Managed Prometheus.

2. Metrics by Service Type

Metrics are organized by what they measure:

  • Disk: Tracks usage (e.g., agent.googleapis.com/disk/bytes_used) with the Ops Agent.
  • Memory: Monitors usage (e.g., agent.googleapis.com/memory/bytes_used) with the Ops Agent.
  • CPU: Measures utilization (e.g., compute.googleapis.com/instance/cpu/utilization) by default.
  • Network: Tracks traffic (e.g., compute.googleapis.com/instance/network/received_bytes_count and sent_bytes_count) by default. Other types like GPU or process metrics may need setup. See Ops Agent Metrics.

3. How to Use Them

Use GCP’s Cloud Monitoring tools to view metrics. The Metrics Explorer lets you interactively explore them—select a metric, set a time range, and see charts or trends. It’s perfect for testing or spotting issues. You can also set alerts or use MQL for deeper analysis. Try it at Metrics Explorer.

4. SDK

GCP provides a Java SDK to programmatically access metrics. Code examples:

5. Metrics which will be used for this issue

  • compute.googleapis.com/instance/cpu/utilization: Shows CPU usage (0 to 1, e.g., 0.75 = 75%) by default.
  • agent.googleapis.com/memory/bytes_used: Tracks memory used (with Ops Agent), labeled by state (e.g., "used").
  • agent.googleapis.com/disk/bytes_used: Monitors disk space used (with Ops Agent), per device and state. Prevents disk shortages.
  • compute.googleapis.com/instance/network/received_bytes_count: Counts bytes received by your VM. Tracks incoming traffic.
  • compute.googleapis.com/instance/network/sent_bytes_count: Counts bytes sent by your VM. Tracks outgoing traffic.

kbashpayev avatar May 14 '25 09:05 kbashpayev