elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

[ML] Inference duration and error metrics

Open prwhelan opened this issue 1 year ago • 3 comments

Add es.inference.requests.time metric around infer API.

As recommended by OTel spec, errors are determined by the presence or absence of the error.type attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException).

Additional notes:

  • ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM.
  • Request count is now always recorded, even when there are failures loading the endpoint configuration.
  • Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream.

Example from local node to APM (redacted a bunch):

{
  "_index": ".ds-metrics-apm.app.elasticsearch-default-2024.10.25-000001",
    "data_stream": {
      "dataset": "apm.app.elasticsearch",
      "namespace": "default",
      "type": "metrics"
    },
    "es": {
      "inference": {
        "requests": {
          "time": {
            "values": [
              6992.5
            ],
            "counts": [
              1
            ]
          }
        }
      }
    },
    "labels": {
      "model_id": "gpt-4o-mini",
      "otel_instrumentation_scope_name": "elasticsearch",
      "service": "openai",
      "task_type": "completion"
    },
    "numeric_labels": {
      "status_code": 200
    },
    ...
  }
}

prwhelan avatar Oct 29 '24 19:10 prwhelan

Hi @prwhelan, I've created a changelog YAML for you.

elasticsearchmachine avatar Oct 29 '24 19:10 elasticsearchmachine

@elasticmachine update branch

prwhelan avatar Oct 30 '24 19:10 prwhelan

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine avatar Oct 31 '24 12:10 elasticsearchmachine

@elasticmachine update branch

prwhelan avatar Nov 05 '24 13:11 prwhelan

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

jonathan-buttner avatar Dec 13 '24 21:12 jonathan-buttner