encode_opentelemetry: add cut off for otel payloads for prometheus mimir
This issue is reported in https://github.com/fluent/fluent-bit/issues/9400.
This is because Prometheus mimir limits the metrics' timestamps within 5 minutes in the same batch: https://github.com/grafana/mimir/blob/main/pkg/distributor/distributor.go#L1010-L1020
what is the side effect of this for other endpoints/users ? is it ok to remove metrics for everybody ?
A far I investigated fluent-bit is repeating infinitely (until restarted) metrics from devices or mounts that no longer exist:
| Sep 27, 2024 @ 10:37:02.140 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:50:15.946Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra6.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
| Sep 27, 2024 @ 10:36:48.274 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:49:17.062Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra5.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
| Sep 27, 2024 @ 10:36:41.445 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:44:55.162Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra2.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
| Sep 27, 2024 @ 10:36:32.213 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:40:47.164Z and is from series node_filesystem_device_error{device="tmpfs", fstype="tmpfs", host_name="petra1.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
| Sep 27, 2024 @ 10:36:18.366 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:40:47.164Z and is from series node_filesystem_size_bytes{device="tmpfs", fstype="tmpfs", host_name="petra1.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
| Sep 27, 2024 @ 10:36:17.153 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:50:15.946Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra6.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
| Sep 27, 2024 @ 10:36:03.301 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:48:17.259Z and is from series node_filesystem_avail_bytes{device="tmpfs", fstype="tmpfs", host_name="petra4.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
| Sep 27, 2024 @ 10:35:53.855 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:51:37.82Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra7.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
| Sep 27, 2024 @ 10:35:48.239 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:49:17.062Z and is from series node_filesystem_size_bytes{device="tmpfs", fstype="tmpfs", host_name="petra5.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
Trying to push metrics from 3 days ago... (tmpfs filesystem after user session) I don't think anyone can benefit from this.
Regards Rafał
Trying to push metrics from 3 days ago... (tmpfs filesystem after user session) I don't think anyone can benefit from this.
Regards Rafał
Just for confirming that this your log is applied this patch or not?
Trying to push metrics from 3 days ago... (tmpfs filesystem after user session) I don't think anyone can benefit from this. Regards Rafał
Just for confirming that this your log is applied this patch or not?
Ah sorry, i'ts a standard 3.1.2 version, I can try to compile from this branch and confirm.
Regards Rafał
what is the side effect of this for other endpoints/users ? is it ok to remove metrics for everybody ?
I added APIs to specify cutoff options. This could be avoiding breaking changes for users who are using otel encoding.
Is this being planned in for a release soon? Any other testing etc. that is needed?
I believe so. But even if it will be merged into fluent-bit tree, there is more works for implementing the cutoff related parameters on out_opentelemetry.