cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

Introduce extension perfmon to monitor database cluster

Open fanfuxiaoran opened this issue 7 months ago • 0 comments

Fixes #ISSUE_Number

What does this PR do?

This PR introduces the ​​perfmon​​ extension, designed to monitor the Cloudberry Database.

Key Features:

  • Cluster-wide Metrics Collection​​: Gathers performance metrics from each machine in the database cluster.
  • Real-time Query Monitoring​​:
    • Displays active queries with detailed execution plans.
    • Tracks progress of each query plan node (similar to EXPLAIN ANALYZE for running queries).
    • Captures performance metrics (CPU, memory, spill files) across multiple segments.
  • ​​Query History​​: Provides a historical record of executed queries.

Components:

  • gpmmon​​ (Shared Library) Added to shared_preload_libraries for background operation. Runs as a backgroud worker process to collect and store metrics in the gpperfmon database.
  • gpmon​​ (Shared Library) Added to shared_preload_libraries for query monitoring. Implements hooks to capture runtime query data. Provides the pg_query_state() function to inspect live query execution status.
  • gpsmon​​ (Binary) Deployed per-instance to collect system-level metrics (CPU, memory, etc.).

image

Type of Change

  • [ ] Bug fix (non-breaking change)
  • [x] New feature (non-breaking change)
  • [ ] Breaking change (fix or feature with breaking changes)
  • [ ] Documentation update

Breaking Changes

Test Plan

  • [ ] Unit tests added/updated
  • [x] Integration tests added/updated
  • [x] Passed make installcheck
  • [ ] Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


fanfuxiaoran avatar May 06 '25 06:05 fanfuxiaoran