celeborn icon indicating copy to clipboard operation
celeborn copied to clipboard

[CELEBORN-1627] Introduce `instance` variable for celeborn dashboard to filter metrics

Open turboFei opened this issue 1 year ago • 1 comments

What changes were proposed in this pull request?

  1. add instanceLabel in metrics source, prefer FQDN:port than ip:port even with celeborn.network.bind.preferIpAddress=false before
  2. add variable instance with label_values(metrics_JVMCPUTime_Value, instance) same as celeborn-jvm-dashboard.json
  3. add filter instance=~"${instance}" for every metrics
  4. add missing legendFormat for memory file storage metrics expressions

Why are the changes needed?

There should be too many celeborn instances in production use case, it is better to add filter with instance.

Does this PR introduce any user-facing change?

Yes. introduce new variable.

But the instance default value is ALL, same behavior as before.

How was this patch tested?

Config: celeborn.network.bind.preferIpAddress=false image image

image

For JVM metrics, before it was ip:port, and now it is FQDN:port. image

turboFei avatar Oct 01 '24 17:10 turboFei

TODO:

Use {{instance}} as default baseLegend and add more labels for metrics likes:

metrics_FlushWorkingQueueSize_Value -> $baseLegend mountpoint={{mountpoint}}
metrics_DeviceOSFreeBytes_Value -> $baseLegend device={{device}}
metrics_DeviceCelebornFreeBytes_Value -> $baseLegend device={{device}}

turboFei avatar Oct 01 '24 22:10 turboFei

Thanks. Merged into main(v.0.6.0).

FMX avatar Oct 09 '24 06:10 FMX