opentelemetry-demo icon indicating copy to clipboard operation
opentelemetry-demo copied to clipboard

Prometheus out of order sample from remote write

Open flands opened this issue 2 weeks ago • 0 comments

Bug Report

Which version of the demo you are using? 1.10.0

Symptom

Prometheus logs indicate out-of-order sample errors and sometimes restarts.

What is the expected behavior?

No log message after ts=2024-06-22T13:59:41.447Z caller=manager.go:163 level=info component="rule manager" msg="Starting rule manager..." and Prometheus doesn't restart

What is the actual behavior?

After some period of time (a few minutes) out of order sample logs are seen and in some cases I've experienced Prometheus restarts. For example:

ts=2024-06-22T14:06:42.763Z caller=write_handler.go:134 level=error component=web msg="Out of order sample from remote write" err="out of order sample" series="{__name__=\"target_info\", container_id=\"f2c9465e88d12e42c419403ef8aab2027b18337e74bf7d9610e9576420d2db10\", docker_cli_cobra_command_path=\"docker%20compose\", host_name=\"f2c9465e88d1\", job=\"cartservice\", telemetry_sdk_language=\"dotnet\", telemetry_sdk_name=\"opentelemetry\", telemetry_sdk_version=\"1.9.0\"}" timestamp=1719065202193

While the cartservice / .NET appears to be the most common, this is seen across other services/languages as well.

Reproduce

Download version 1.10.0 and run make start. Once started, tail the Prometheus containers logs (tail -f <ID>) and wait.

Additional Context

Consider enabling out-of-order sample support available in Prometheus 2.39.x: https://github.com/prometheus/prometheus/pull/11075

flands avatar Jun 22 '24 14:06 flands