datadog-actions-metrics icon indicating copy to clipboard operation
datadog-actions-metrics copied to clipboard

Can it send job level metrics only?

Open jduan-highnote opened this issue 2 years ago • 5 comments

When the collect-job-metrics flag is set to true, metrics at the job level as well as step level are sent to datadog. This doesn't work well if a workflow has a lot of jobs and steps. For a large workflow we have, not every job & step level metric is sent to datadog because I think there's a limit.

Can this flag be broken down to two flags?

  • collect-job-metrics (collect job-level metrics only)
  • collect-step-metrics (collect step-level metrics only)

That way, people can choose what they want. Thanks!

jduan-highnote avatar Nov 24 '22 20:11 jduan-highnote

BTW, I've seen this error as well Error: HTTP-Code: 413 Message: {"errors":["Payload too large"]}. It can be avoided if there's a more granular configuration of what metrics to send.

jduan-highnote avatar Nov 24 '22 20:11 jduan-highnote

It seems Datadog metrics API has 10MB limit. https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/1925

I will add a flag of collect-step-metrics. It would be also effective for the custom metrics cost of Datadog.

I think it is possible to send the metrics by multiple requests. I will try it as well.

int128 avatar Nov 26 '22 03:11 int128

According to the API doc https://datadoghq.dev/datadog-api-client-typescript/classes/v1.MetricsApi.html#submitMetrics, the maximum payload size is 3.2MB.

int128 avatar Nov 26 '22 05:11 int128

Thanks for fixing this so quickly!

jduan-highnote avatar Nov 26 '22 17:11 jduan-highnote

@int128 quick follow up: I have a very large workflow that has many jobs (actually the number of jobs is dynamic). It seems that job-level metrics are capped at 399? I see this in the log Sending 399 metrics to Datadog. Due to this limit, some of the job metrics aren't sent. Can all the job metrics be sent in batches?

jduan-highnote avatar Jan 04 '23 01:01 jduan-highnote