prometheus consider clarify "Out of order sample from remote write" error (remote-write-receiver feature enabled)

What did you do?

We just enabled the remote-write-receiver feature flag. And set a remoteWrite URL (with some additional target_label(s)) in order to write to /api/v1/write endpoint from another prometheus:

prometheus:
  prometheusSpec:
    ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#remotewritespec
    remoteWrite: 
    - url: "http://10.XXX.0.XXX:30090/api/v1/write"
      - target_label: "foo"
        replacement: "bar"

$ helm upgrade --reuse-values --version 15.2.1 --set operator.logLevel=debug -f override.yaml prometheus-operator prometheus-community/kube-prometheus-stack -n monitoring

What did you expect to see?

.

What did you see instead? Under which circumstances?

{"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.495Z"}

Environment

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:58:09Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:32:49Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}

System information:

$ uname -a

Linux node-monitor-1 5.8.0-50-generic #56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

In both clusters:

$ timedatectl show

Timezone=Europe/Istanbul
LocalRTC=no
CanNTP=yes
NTP=yes
NTPSynchronized=yes

Prometheus version:

Deployed via Helm

v2.29.1

config-reloader version:

prometheus-config-reloader:v0.47.0

thanos-sidecar version:

v0.13.0

Prometheus configuration file:

Default helm chart:

$ helm list

NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                            APP VERSION
prometheus-operator     monitoring      11              2021-08-17 16:11:18.900458 +0300 +03    deployed        kube-prometheus-stack-15.2.1     0.47.0

override.yaml:

prometheus:
  prometheusSpec:
    enableFeatures: [
      "remote-write-receiver"
    ]

$ helm upgrade --reuse-values --version 15.2.1 --set operator.logLevel=debug -f override.yaml prometheus-operator prometheus-community/kube-prometheus-stack -n monitoring

$ stern prometheus-prometheus-operator-kube-p-prometheus-0

Full logs

...
...
...
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.478Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.480Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.483Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.495Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.497Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.538Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.910Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"duplicate sample for timestamp","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:01:37.839Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:01:37.852Z"}

I think this error throws from this line, which is get called by this for-loop.

Any ideas why we get this error? I could not find much (i.e., docs, troubleshooting guide, etc.) about this error so I created an issue to discuss this.

Aug 18 '21 09:08 Dentrax

I honestly don't know how the helm chart deploys prometheus. If it's deploying an HA pair and you aren't adding external labels to differentiate between each of those prometheus instances then out of order errors would be expected in the remote write receiving prometheus since all the series from the two sending prometheus instance would have the exact same label sets.

Aug 20 '21 17:08 cstyan

Could you provide us more info about your setup? is it HA? do you have different external labels?

Aug 24 '21 15:08 roidelapluie

kind ping

Sep 08 '21 15:09 roidelapluie

Hey! Sorry, I totally forgot here. Since I'm not a cluster admin, I could not sure whether it's deploying HA pair. Unfornutally, I do not have much additional information about the setup.

you aren't adding external labels to differentiate between each of those prometheus instances then out of order errors would be expected

Possibly, it's because of this.

Sep 11 '21 15:09 Dentrax

The error message could be more helpful here. It would be desirable to have it say the name and labels of the timeseries that is receiving out of order data to help debug this. I see this happen in one of my deployments with 370K timeseries and I am not really sure how to even start to find the one that's giving me the error.

Sep 24 '21 09:09 sandersaares

The error message's details are all from the receiving end, so we can make an improvement here to the error that Prometheus returns when it scrapes/receives remote write but other remote write receivers (such as cortex, influx, m3db, etc.) might not return the same information.

Sep 24 '21 15:09 cstyan

The same problem

Dec 29 '21 13:12 maxpain

I have the same problem sending metrics from prometheus-agent to a prometheus instance. This issue happens when the prom-agent restarts and reloads the wal stored in the data directory. If i clear the data directory and restart the prom-agent, everything goes well again.

I put more details in https://github.com/prometheus/prometheus/issues/9867#issuecomment-1015316199

Jan 19 '22 07:01 ebarped

Any news on this?

I am having the same issue. Using Prometheus v2.35.0 installed via kube-prometheus-stack chart. Things I have checked so far:

I made sure that there are no duplicate samples by adding unique external labels all instances. They are printed in the logs.
Cleared all data of all Prometheus instances
Checked system clocks and made sure that time is synced & correct
Found that the issue is not tied to one specific scrape target - they come from various targets (which are discovered by ServiceMonitors)

A sample log from the observer side (remote write receiver) are below. Here, the label host=raspberry-pi-4 is an external label defined on the observee instance:

ts=2022-05-02T23:17:09.881Z caller=write_handler.go:102 level=error component=web msg="Out of order sample from remote write" err="out of order sample" series="{__name__=\"instance:node_cpu:ratio\", container=\"node-exporter\", endpoint=\"http-metrics\", host=\"raspberry-pi-4\", instance=\"172.18.0.7:9100\", job=\"node-exporter\", namespace=\"monitoring\", pod=\"kube-prometheus-stack-prometheus-node-exporter-xzhjk\", prometheus=\"monitoring/kube-prometheus-stack-prometheus\", prometheus_replica=\"prometheus-kube-prometheus-stack-prometheus-0\", service=\"kube-prometheus-stack-prometheus-node-exporter\"}" timestamp=1651533423907

And from the observee side:

ts=2022-05-02T23:17:09.899Z caller=dedupe.go:112 component=remote level=error remote_name=54c22e url=http://172.18.0.6:9090/api/v1/write msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of order sample"

May 02 '22 23:05 utkuozdemir

@utkuozdemir is this intermittent? on startup? constant? the occasional out of order message is not out of the ordinary or necessarily a problem

May 03 '22 15:05 cstyan

@utkuozdemir is this intermittent? on startup? constant? the occasional out of order message is not out of the ordinary or necessarily a problem

It is constant. However I started to suspect that it might be related to my upload bandwidth. I have 2 raspberry pis that do remote_write every 5 seconds from my home connection to internet, and my upload speed drops to as low as 5mbps during busy hours. I noticed that these 2 prometheuses kind of saturate my upload bandwidth and cause network stability issues overall, including to download.

I'll try to test them in a more stable connection with higher bandwidth and also reduce the frequency and the size of samples pushed. If it helps, I'll let you know.

May 03 '22 15:05 utkuozdemir

I'm having this same issue with Federated prometheus running in GKE clusters. The main operator prometheus has restarted nearly 60 times since the middle of the night last night.

May 03 '22 18:05 tdgeery

I'm having this same issue with Federated prometheus running in GKE clusters. The main operator prometheus has restarted nearly 60 times since the middle of the night last night.

Out of order messages are common after restarts. I can't remember right now if we've fixed all known cases where out of orders should not be sent after restarts, but I'd suggest looking into your restarts issue first.

May 03 '22 18:05 cstyan

@cstyan If out-of-order messages are common on agent restarts (specific in my case) is this something we can "special case" to reduce the "noise" here of this specific set of errors (both on the receiver side and write/agent side)?

I'm tempted to think here that the on-disk WAL for Prometheus agent(s) probably need configuration tuning? (something in our control I guess) but it does beg the question of the usefulness of the WAL and replay if the Agent will just recieve 400 Bad Request(s) on write attempts of log entries that have already been written to the remote receiver...

Jul 14 '22 02:07 prologic

+1

Aug 03 '22 18:08 datsabk

I'm having the same problem after enabling the remote write in order to send recording rules from Loki Ruler.

Aug 25 '22 19:08 alexandre1984rj

@prologic the right path forward is what we refer to as "remote write checkpointing", the ability for remote write to keep track of where within the WAL it has successfully sent samples from, and update that tracker regularly. This would mean that upon restart remote write can start from that point rather than missing data or sending data that may have already been successfully sent. The issue is here.

As for the configuration of the WAL and it's usefulness; yes you can configure some parameters, in particular the checkpoint and truncation timing of the WAL may be of interest to you since you're using the agent mode, though doing so may reduce the useful properties of the WAL. The WAL, in the case of the agent, serves as the buffer for remote write. Many mistakenly think that because their agent is happily chugging along sending metrics they have no need for the WAL, but it's purpose is to protect you from losing metrics in the event of network congestion or an actual network disruption between Prometheus/the agent and the remote write endpoint. Be aware that changing various remote write or WAL configuration could mean you lose more data in the previously mentioned situations.

  87   │       --storage.agent.retention.min-time=STORAGE.AGENT.RETENTION.MIN-TIME
  88   │                                  Minimum age samples may be before being
  89   │                                  considered for deletion when the WAL is
  90   │                                  truncated Use with agent mode only.
  91   │       --storage.agent.retention.max-time=STORAGE.AGENT.RETENTION.MAX-TIME
  92   │                                  Maximum age samples may be before being
  93   │                                  forcibly deleted when the WAL is truncated Use
  94   │                                  with agent mode only.

Aug 25 '22 20:08 cstyan

@jesusvazquez should we close this one in favor of https://github.com/prometheus/prometheus/pull/11075?

Sep 21 '22 16:09 Dentrax

Thanks for the ping @Dentrax

Yes this can be closed now that Prometheus's TSDB has out-of-order support. Note it will be available starting on v2.39.

Also out of order will not be enabled by default, it will be flagged as experimental and opt-in for anyone facing out-of-order errors.

We're planning on adding some documentation clarification as part of https://github.com/prometheus/prometheus/issues/11329 and perhaps a hint to enable out-of-order when these errors happen.

Sep 22 '22 15:09 jesusvazquez

I've tried my best to digest this thread but am fairly new to Prometheus and very new to remote write, so forgive me if I'm asking about things you've already answered, but is turning on out-of-order support the best option at this point, even though it's experimental? I've read some things about deleting WAL but not sure I want to do that or, if I were to do it, exactly where to do it.

In a pre-prod environment I have one "central" instance and three other instances. The central instance used to federate these instances, but I've enabled remote-write and have been experimenting with different queue configs today. One thing I've noticed is constant failures related to "out of order" and "duplicate timestamp." The numbers go up after a restart and then settle back down to a constant. In Grafana my dashboard panel that uses prometheus_remote_storage_samples_failed_total looks like this (brief drops then spikes towards the end are when I restarted the central instance):

Note the "normal" amount is 1000-1800 per second!

Logs from one of the remote-write senders look like this:

ts=2022-11-03T22:01:54.211Z caller=dedupe.go:112 component=remote level=error remote_name=95eccb url=https://x.net/api/v1/write msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: duplicate sample for timestamp"

ts=2022-11-03T22:02:34.099Z caller=dedupe.go:112 component=remote level=error remote_name=95eccb url=https://x.net/api/v1/write msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of order sample"

Nov 03 '22 22:11 davidpellcb

Hello @davidpellcb

but is turning on out-of-order support the best option at this point, even though it's experimental?

You can try it up and if you find any issues disable it. We`re using it in production with clients and millions of series, maybe this can give you a bit of confidence. No bugs found yet but we still want to be cautious and that is why it is flagged as experimental.

We continue to make changes to it and improve the overall state of the TSDB after it was merged. There is people watching this so if you found any issue we'd jump right on it.

I've read some things about deleting WAL but not sure I want to do that or, if I were to do it, exactly where to do it.

We did have a bug in the past about WAL being discarded but this was sorted out before out-of-order support was merged.

If you want to enable out-of-order you just need to set up the out-of-order window to a few minutes. I think a few minutes will be enough for you because you only get issues on restarts.

You can do so in the tsdb section of your yaml configuration file, see this https://github.com/prometheus/prometheus/blob/main/docs/configuration/configuration.md#tsdb

Nov 05 '22 10:11 jesusvazquez

Thank you so much for the response @jesusvazquez. We install Prometheus via the kube-prometheus-stack Helm chart and unfortunately I don't think they've exposed this option via helm values yet.

I think a few minutes will be enough for you because you only get issues on restarts.

In our case we're getting constant out-of-order and duplicate timestamp errors. They get worse on restarts and then settle back to what seems like a still-high amount. Any idea what I should be looking at to see why that's happening??

Nov 08 '22 16:11 davidpellcb

In our guess we're getting constant out-of-order and duplicate timestamp errors. They get worse on restarts and then settle back to what seems like a still-high amount. Any idea what I should be looking at to see why that's happening??

There is probably a misconfiguration on your side but I can't really tell what it is, you'd have to debug it more closely. I'd suspect that all your federated servers are sending the same timeseries including the same labels and that leads to a base level of out-of-order errors or new value for timestamp errors.

This is however out of the scope of the existing issue.

cc @codesome Since out-of-order is now available and it has been documented how to enable it, I'd recommend closing this issue.

Nov 22 '22 09:11 jesusvazquez

Closing as per the above comment. If you have any more issues, please open another issue with details.

Dec 14 '22 04:12 codesome

prometheus prometheus copied to clipboard

consider clarify "Out of order sample from remote write" error (remote-write-receiver feature enabled)

prometheus
prometheus copied to clipboard