prometheus
prometheus copied to clipboard
consider clarify "Out of order sample from remote write" error (remote-write-receiver feature enabled)
What did you do?
We just enabled the remote-write-receiver feature flag. And set a remoteWrite URL (with some additional target_label(s)) in order to write to /api/v1/write endpoint from another prometheus:
prometheus:
prometheusSpec:
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#remotewritespec
remoteWrite:
- url: "http://10.XXX.0.XXX:30090/api/v1/write"
- target_label: "foo"
replacement: "bar"
$ helm upgrade --reuse-values --version 15.2.1 --set operator.logLevel=debug -f override.yaml prometheus-operator prometheus-community/kube-prometheus-stack -n monitoring
What did you expect to see?
.
What did you see instead? Under which circumstances?
{"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.495Z"}
Environment
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:58:09Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:32:49Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}
- System information:
$ uname -a
Linux node-monitor-1 5.8.0-50-generic #56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
In both clusters:
$ timedatectl show
Timezone=Europe/Istanbul
LocalRTC=no
CanNTP=yes
NTP=yes
NTPSynchronized=yes
- Prometheus version:
Deployed via Helm
v2.29.1
- config-reloader version:
prometheus-config-reloader:v0.47.0
- thanos-sidecar version:
v0.13.0
- Prometheus configuration file:
Default helm chart:
$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
prometheus-operator monitoring 11 2021-08-17 16:11:18.900458 +0300 +03 deployed kube-prometheus-stack-15.2.1 0.47.0
- override.yaml:
prometheus:
prometheusSpec:
enableFeatures: [
"remote-write-receiver"
]
$ helm upgrade --reuse-values --version 15.2.1 --set operator.logLevel=debug -f override.yaml prometheus-operator prometheus-community/kube-prometheus-stack -n monitoring
$ stern prometheus-prometheus-operator-kube-p-prometheus-0
Full logs
...
...
...
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.478Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.480Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.483Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.495Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.497Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.538Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:13:36.910Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"duplicate sample for timestamp","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:01:37.839Z"}
prometheus-prometheus-operator-kube-p-prometheus-0 prometheus {"caller":"write_handler.go:53","component":"web","err":"out of order sample","level":"error","msg":"Out of order sample from remote write","ts":"2021-08-18T09:01:37.852Z"}
I think this error throws from this line, which is get called by this for-loop.
Any ideas why we get this error? I could not find much (i.e., docs, troubleshooting guide, etc.) about this error so I created an issue to discuss this.
I honestly don't know how the helm chart deploys prometheus. If it's deploying an HA pair and you aren't adding external labels to differentiate between each of those prometheus instances then out of order errors would be expected in the remote write receiving prometheus since all the series from the two sending prometheus instance would have the exact same label sets.
Could you provide us more info about your setup? is it HA? do you have different external labels?
kind ping
Hey! Sorry, I totally forgot here. Since I'm not a cluster admin, I could not sure whether it's deploying HA pair. Unfornutally, I do not have much additional information about the setup.
you aren't adding external labels to differentiate between each of those prometheus instances then out of order errors would be expected
Possibly, it's because of this.
The error message could be more helpful here. It would be desirable to have it say the name and labels of the timeseries that is receiving out of order data to help debug this. I see this happen in one of my deployments with 370K timeseries and I am not really sure how to even start to find the one that's giving me the error.
The error message's details are all from the receiving end, so we can make an improvement here to the error that Prometheus returns when it scrapes/receives remote write but other remote write receivers (such as cortex, influx, m3db, etc.) might not return the same information.
The same problem
I have the same problem sending metrics from prometheus-agent to a prometheus instance. This issue happens when the prom-agent restarts and reloads the wal stored in the data directory. If i clear the data directory and restart the prom-agent, everything goes well again.
I put more details in https://github.com/prometheus/prometheus/issues/9867#issuecomment-1015316199
Any news on this?
I am having the same issue. Using Prometheus v2.35.0 installed via kube-prometheus-stack chart. Things I have checked so far:
- I made sure that there are no duplicate samples by adding unique external labels all instances. They are printed in the logs.
- Cleared all data of all Prometheus instances
- Checked system clocks and made sure that time is synced & correct
- Found that the issue is not tied to one specific scrape target - they come from various targets (which are discovered by
ServiceMonitors)
A sample log from the observer side (remote write receiver) are below. Here, the label host=raspberry-pi-4 is an external label defined on the observee instance:
ts=2022-05-02T23:17:09.881Z caller=write_handler.go:102 level=error component=web msg="Out of order sample from remote write" err="out of order sample" series="{__name__=\"instance:node_cpu:ratio\", container=\"node-exporter\", endpoint=\"http-metrics\", host=\"raspberry-pi-4\", instance=\"172.18.0.7:9100\", job=\"node-exporter\", namespace=\"monitoring\", pod=\"kube-prometheus-stack-prometheus-node-exporter-xzhjk\", prometheus=\"monitoring/kube-prometheus-stack-prometheus\", prometheus_replica=\"prometheus-kube-prometheus-stack-prometheus-0\", service=\"kube-prometheus-stack-prometheus-node-exporter\"}" timestamp=1651533423907
And from the observee side:
ts=2022-05-02T23:17:09.899Z caller=dedupe.go:112 component=remote level=error remote_name=54c22e url=http://172.18.0.6:9090/api/v1/write msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of order sample"
@utkuozdemir is this intermittent? on startup? constant? the occasional out of order message is not out of the ordinary or necessarily a problem
@utkuozdemir is this intermittent? on startup? constant? the occasional
out of ordermessage is not out of the ordinary or necessarily a problem
It is constant. However I started to suspect that it might be related to my upload bandwidth. I have 2 raspberry pis that do remote_write every 5 seconds from my home connection to internet, and my upload speed drops to as low as 5mbps during busy hours. I noticed that these 2 prometheuses kind of saturate my upload bandwidth and cause network stability issues overall, including to download.
I'll try to test them in a more stable connection with higher bandwidth and also reduce the frequency and the size of samples pushed. If it helps, I'll let you know.
I'm having this same issue with Federated prometheus running in GKE clusters. The main operator prometheus has restarted nearly 60 times since the middle of the night last night.
I'm having this same issue with Federated prometheus running in GKE clusters. The main operator prometheus has restarted nearly 60 times since the middle of the night last night.
Out of order messages are common after restarts. I can't remember right now if we've fixed all known cases where out of orders should not be sent after restarts, but I'd suggest looking into your restarts issue first.
@cstyan If out-of-order messages are common on agent restarts (specific in my case) is this something we can "special case" to reduce the "noise" here of this specific set of errors (both on the receiver side and write/agent side)?
I'm tempted to think here that the on-disk WAL for Prometheus agent(s) probably need configuration tuning? (something in our control I guess) but it does beg the question of the usefulness of the WAL and replay if the Agent will just recieve 400 Bad Request(s) on write attempts of log entries that have already been written to the remote receiver...
+1
I'm having the same problem after enabling the remote write in order to send recording rules from Loki Ruler.
@prologic the right path forward is what we refer to as "remote write checkpointing", the ability for remote write to keep track of where within the WAL it has successfully sent samples from, and update that tracker regularly. This would mean that upon restart remote write can start from that point rather than missing data or sending data that may have already been successfully sent. The issue is here.
As for the configuration of the WAL and it's usefulness; yes you can configure some parameters, in particular the checkpoint and truncation timing of the WAL may be of interest to you since you're using the agent mode, though doing so may reduce the useful properties of the WAL. The WAL, in the case of the agent, serves as the buffer for remote write. Many mistakenly think that because their agent is happily chugging along sending metrics they have no need for the WAL, but it's purpose is to protect you from losing metrics in the event of network congestion or an actual network disruption between Prometheus/the agent and the remote write endpoint. Be aware that changing various remote write or WAL configuration could mean you lose more data in the previously mentioned situations.
87 │ --storage.agent.retention.min-time=STORAGE.AGENT.RETENTION.MIN-TIME
88 │ Minimum age samples may be before being
89 │ considered for deletion when the WAL is
90 │ truncated Use with agent mode only.
91 │ --storage.agent.retention.max-time=STORAGE.AGENT.RETENTION.MAX-TIME
92 │ Maximum age samples may be before being
93 │ forcibly deleted when the WAL is truncated Use
94 │ with agent mode only.
@jesusvazquez should we close this one in favor of https://github.com/prometheus/prometheus/pull/11075?
Thanks for the ping @Dentrax
Yes this can be closed now that Prometheus's TSDB has out-of-order support. Note it will be available starting on v2.39.
Also out of order will not be enabled by default, it will be flagged as experimental and opt-in for anyone facing out-of-order errors.
We're planning on adding some documentation clarification as part of https://github.com/prometheus/prometheus/issues/11329 and perhaps a hint to enable out-of-order when these errors happen.
I've tried my best to digest this thread but am fairly new to Prometheus and very new to remote write, so forgive me if I'm asking about things you've already answered, but is turning on out-of-order support the best option at this point, even though it's experimental? I've read some things about deleting WAL but not sure I want to do that or, if I were to do it, exactly where to do it.
In a pre-prod environment I have one "central" instance and three other instances. The central instance used to federate these instances, but I've enabled remote-write and have been experimenting with different queue configs today. One thing I've noticed is constant failures related to "out of order" and "duplicate timestamp." The numbers go up after a restart and then settle back down to a constant. In Grafana my dashboard panel that uses prometheus_remote_storage_samples_failed_total looks like this (brief drops then spikes towards the end are when I restarted the central instance):
Note the "normal" amount is 1000-1800 per second!
Logs from one of the remote-write senders look like this:
ts=2022-11-03T22:01:54.211Z caller=dedupe.go:112 component=remote level=error remote_name=95eccb url=https://x.net/api/v1/write msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: duplicate sample for timestamp"
ts=2022-11-03T22:02:34.099Z caller=dedupe.go:112 component=remote level=error remote_name=95eccb url=https://x.net/api/v1/write msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of order sample"
Hello @davidpellcb
but is turning on out-of-order support the best option at this point, even though it's experimental?
You can try it up and if you find any issues disable it. We`re using it in production with clients and millions of series, maybe this can give you a bit of confidence. No bugs found yet but we still want to be cautious and that is why it is flagged as experimental.
We continue to make changes to it and improve the overall state of the TSDB after it was merged. There is people watching this so if you found any issue we'd jump right on it.
I've read some things about deleting WAL but not sure I want to do that or, if I were to do it, exactly where to do it.
We did have a bug in the past about WAL being discarded but this was sorted out before out-of-order support was merged.
If you want to enable out-of-order you just need to set up the out-of-order window to a few minutes. I think a few minutes will be enough for you because you only get issues on restarts.
You can do so in the tsdb section of your yaml configuration file, see this https://github.com/prometheus/prometheus/blob/main/docs/configuration/configuration.md#tsdb
Thank you so much for the response @jesusvazquez. We install Prometheus via the kube-prometheus-stack Helm chart and unfortunately I don't think they've exposed this option via helm values yet.
I think a few minutes will be enough for you because you only get issues on restarts.
In our case we're getting constant out-of-order and duplicate timestamp errors. They get worse on restarts and then settle back to what seems like a still-high amount. Any idea what I should be looking at to see why that's happening??
In our guess we're getting constant out-of-order and duplicate timestamp errors. They get worse on restarts and then settle back to what seems like a still-high amount. Any idea what I should be looking at to see why that's happening??
There is probably a misconfiguration on your side but I can't really tell what it is, you'd have to debug it more closely. I'd suspect that all your federated servers are sending the same timeseries including the same labels and that leads to a base level of out-of-order errors or new value for timestamp errors.
This is however out of the scope of the existing issue.
cc @codesome Since out-of-order is now available and it has been documented how to enable it, I'd recommend closing this issue.
Closing as per the above comment. If you have any more issues, please open another issue with details.