elasticsearch
elasticsearch copied to clipboard
[ML] Inference duration and error metrics
Add es.inference.requests.time metric around infer API.
As recommended by OTel spec, errors are determined by the presence or absence of the error.type attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException).
Additional notes:
- ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM.
- Request count is now always recorded, even when there are failures loading the endpoint configuration.
- Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream.
Example from local node to APM (redacted a bunch):
{
"_index": ".ds-metrics-apm.app.elasticsearch-default-2024.10.25-000001",
"data_stream": {
"dataset": "apm.app.elasticsearch",
"namespace": "default",
"type": "metrics"
},
"es": {
"inference": {
"requests": {
"time": {
"values": [
6992.5
],
"counts": [
1
]
}
}
}
},
"labels": {
"model_id": "gpt-4o-mini",
"otel_instrumentation_scope_name": "elasticsearch",
"service": "openai",
"task_type": "completion"
},
"numeric_labels": {
"status_code": 200
},
...
}
}
Hi @prwhelan, I've created a changelog YAML for you.
@elasticmachine update branch
Pinging @elastic/ml-core (Team:ML)
@elasticmachine update branch
💚 All backports created successfully
| Status | Branch | Result |
|---|---|---|
| ✅ | 8.x |
Questions ?
Please refer to the Backport tool documentation