dashboards icon indicating copy to clipboard operation
dashboards copied to clipboard

[Bug] Metrics Exported by Temporal Client's metrics Handler does not match with the ones used in SDK Metrics json file

Open abhishekhugetech opened this issue 1 year ago • 1 comments

What are you really trying to do?

I'm trying to load test temporal and see the useful metrics which can tell about how my self hosted temporal deployment is working at scale. All of temporal SDK metrics are getting listed in Grafana's metrics browser, but when I try to Visualise them by importing the SDK dashboard json file, most of the charts are just empty(No data).

Only after changing the name of these metrics used in the expression I'm able to see these graph on dashboard.

Describe the bug

The name of metrics exported by temporal Client's metrics handler does not match with the ones used in JSON file for Grafana Dashboard.

Importing the SDK metrics I see this

image

After changing the name of metrics used in expression

image

Environment/Versions

I'm using temporal in local machine setup using docker, The Grafana and Prometheus have also been setup with docker. I'm using Golang and these are the Temporal SDK version I'm using:

go.temporal.io/api v1.19.1-0.20230322213042-07fb271d475b
go.temporal.io/sdk v1.22.1
go.temporal.io/sdk/contrib/tally v0.2.0

Additional context

I wanted to know If I'm doing something wrong here or I need to do some additional changes in order to see the metrics using the dashboard JSON file.

Attaching the replacement keys used for getting these metrics shown up in Grafana Dashboard, as It can be helpful or someone else.

{
    "existing_key_in_json_file": "needs to be replaced with below values",

    "temporal_request": "temporal_request_total",
    "temporal_request_latency_bucket": "temporal_request_latency_seconds_bucket",
    "temporal_workflow_completed": "temporal_workflow_completed_total",
    "temporal_workflow_failed": "temporal_workflow_failed_total",
    "temporal_workflow_endtoend_latency_bucket": "temporal_workflow_endtoend_latency_seconds_bucket",
    "temporal_workflow_task_queue_poll_succeed": "temporal_workflow_task_queue_poll_succeed_total",
    "temporal_workflow_task_queue_poll_empty": "temporal_workflow_task_queue_poll_empty_total",
    "temporal_workflow_task_schedule_to_start_latency_bucket": "temporal_workflow_task_schedule_to_start_latency_seconds_bucket",
    "temporal_workflow_task_execution_latency_bucket": "temporal_workflow_task_execution_latency_seconds_bucket",
    "temporal_workflow_task_replay_latency_bucket": "temporal_workflow_task_replay_latency_seconds_bucket",
    "temporal_activity_execution_latency_count": "temporal_activity_execution_latency_seconds_count",
    "temporal_activity_execution_failed": "temporal_activity_execution_failed_total",
    "temporal_activity_execution_latency_bucket": "temporal_activity_execution_latency_seconds_bucket",
    "temporal_activity_poll_no_task": "temporal_activity_poll_no_task_total",
    "temporal_activity_schedule_to_start_latency_bucket": "temporal_activity_schedule_to_start_latency_seconds_bucket"
}

abhishekhugetech avatar Apr 22 '23 09:04 abhishekhugetech

Can we get the SDK Metrics dashboard fixed? Some things that are missing in addition to the OP:

  • Datasource template variable
  • Namespace/WorkflowType variables non-functional

prologic avatar Jul 13 '23 06:07 prologic