loki
loki copied to clipboard
Promtail encounters a corrupted positions.yml file
Describe the bug
Potentially after rolling out promtail
I have encountered that the positions.yml
files have been corrupted.
To Reproduce
Unsure how to reproduce this is on two different AKS clusters, that I have updated the promtail version from k98 to k99 only today.
Expected behavior
Not having a yml file full of \0
bytes
A clear and concise description of what you expected to happen.
Environment:
- Promtail: Certainly in k98+
Screenshots, Promtail config, or terminal output
level=error ts=2022-05-30T12:11:21.229252137Z caller=main.go:117 msg="error creating promtail" error="invalid yaml positions file [/var/log/positions.yml]: yaml: control characters are not allowed"
- Positions file examples:
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[...]
00000070: 0000 0000 ....
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[...]
00000b80: 0000 0000 0000 0000 00 .........
Same bug on version 1.3.0
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale
label sorted by thumbs up.
We may also:
- Mark issues as
revivable
if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalive
label to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
I'm seeing the same issue for Promtail running version 2.2.1 on windows.
Got this same issue for Promtail version 2.5.0 on windows. Had to delete the positions.yaml file and restart promtail.
I have received the same issue on version 2.7.4
on Windows, however, on Linux this issue is not present for the same version of Promtail.
This bug seems to consistently reproduce on windows on abrupt shutdown (due to power outage)
This is still an issue with promtail. Are there any plans on fixing this issue?
This happens way too often on Windows. I think one way to try remediate this could be to reduce frequency of writes and write to a temp file before removing the old one and renaming the new one (kind of a copy on write).
Still happens on 3.1.0, and I'm pretty sure in some cases hosts were only shutting down through the start menu. Every week there is at least 20 broken positions files. Also, this is just a bad software design. Not every PC that uses promtail have UPS with auto-shutdown, disabled write-ahead drive cache, drive controller with a battery/supercapacitors and a backup server to gracefully backup the positions file every n-hours.
Besides lowering write frequency, Or at least having a second file as a backup in such cases, and detect corrupted positions backup file on load, auto-fixing them.
- Copying current positions file to possitions.yaml.bak awaiting operation to finish, and (probably an overkill) quick-checking if the file content isn't just all null-bytes;
- Overwriting the "Current" positions file;
Same problem on version 3.1.0 and Windows Server 2022.