promdump
promdump copied to clipboard
Found unsequential head chunk files
I am running promdump in a test environment and occasionally I can not run the meta command because I get the following error message:
# kubectl promdump meta -n openshift-monitoring -p prometheus-k8s-0 -c prometheus -d /prometheus
time=2022-05-19T15:49:05Z caller=level.go:63 level=error error="found unsequential head chunk files chunks_head/000010 (index: 10) and chunks_head/000012 (index: 12)"
failed to exec command: command terminated with exit code 1
Is it possible to make promdump tolerate of the reported condition of unsequential head chunk files?
Although we can update meta
to ignore the error, the restore
will likely fail anyway. I added a section of this under the FAQ (search for Q: The promdump meta and promdump restore subcommands are failing with this error:
). Best bet is to remove chunks_head/000010
from the dump file. It will cause some "head chunks" to be lost, but majority of the data in the persistence blocks will still be preserved.
I normally only see this in OpenShift. Is that what you are using?
I normally only see this in OpenShift. Is that what you are using?
Yes this is OpenShift.
Is there something that causes unsequential head chunk files in the first place?
I never quite figure that out. I suspect it has something do with parts of CMO that I don''t understand. FWIW, I also haven't had a chance to try promdump with Thanos, Cortex etc. To-date, I haven't seen disjoint chunk files like that in prom.
hmm, perhaps we could just add a command/option to trim/remove unsequential head chunks:
# oc -n openshift-monitoring rsh prometheus-k8s-1 ls -l /prometheus/chunks_head/
total 376836
-rw-r--r--. 1 nobody nobody 123971101 Jun 8 13:00 000010
-rw-r--r--. 1 nobody nobody 113242593 Jun 8 14:30 000011
-rw-r--r--. 1 nobody nobody 8 Jun 8 14:39 000012
# oc -n openshift-monitoring rsh prometheus-k8s-0 ls -l /prometheus/chunks_head/
total 409604
-rw-rw-rw-. 1 nobody nobody 8 Jun 8 04:39 000006
-rw-r--r--. 1 nobody nobody 123533546 Jun 8 13:00 000010
-rw-r--r--. 1 nobody nobody 121632593 Jun 8 14:33 000011
# kubectl promdump meta -n openshift-monitoring -p prometheus-k8s-0 -c prometheus -d /prometheus
time=2022-06-08T14:40:03Z caller=level.go:63 level=error error="found unsequential head chunk files chunks_head/000006 (index: 6) and chunks_head/000010 (index: 10)"
failed to exec command: command terminated with exit code 1
# kubectl promdump meta -n openshift-monitoring -p prometheus-k8s-1 -c prometheus -d /prometheus
Head Block Metadata
------------------------
Minimum time (UTC): | 2022-06-08 12:00:00
Maximum time (UTC): | 2022-06-08 14:40:28
Number of series | 623197
Persistent Blocks Metadata
----------------------------
Minimum time (UTC): | 2022-06-07 19:58:43
Maximum time (UTC): | 2022-06-08 12:00:00
Total number of blocks | 4
Total number of samples | 902497366
Total number of series | 2582714
Total size | 1286064951
My guess is that maybe this is related to the fact there are two running prometheus instances?
After trimming the file:
# oc -n openshift-monitoring rsh prometheus-k8s-0
sh-4.4$ ls -l
total 20
drwxr-sr-x. 3 nobody nobody 68 Jun 8 09:00 01G518NHTWV39BT234Q25DMFCB
drwxr-sr-x. 3 nobody nobody 68 Jun 8 11:00 01G51FH92WH3E7BW3W57JW7ZNG
drwxr-sr-x. 3 nobody nobody 68 Jun 8 11:00 01G51FJA7S7FXK0NAB9JYP3KHG
drwxr-sr-x. 3 nobody nobody 68 Jun 8 13:00 01G51PD0AV8X4PW2469Q6RG84Q
drwxr-sr-x. 2 nobody nobody 48 Jun 8 13:00 chunks_head
-rw-r--r--. 1 nobody nobody 0 Jun 7 19:58 lock
-rw-r--r--. 1 nobody nobody 20001 Jun 8 14:40 queries.active
drwxr-sr-x. 3 nobody nobody 145 Jun 8 14:27 wal
sh-4.4$ ls -l chunks_head
total 409604
-rw-rw-rw-. 1 nobody nobody 8 Jun 8 04:39 000006
-rw-r--r--. 1 nobody nobody 123533546 Jun 8 13:00 000010
-rw-r--r--. 1 nobody nobody 121632593 Jun 8 14:33 000011
sh-4.4$ rm -rf chunks_head/000006
sh-4.4$ exit
exit
# kubectl promdump meta -n openshift-monitoring -p prometheus-k8s-0 -c prometheus -d /prometheus
Head Block Metadata
------------------------
Minimum time (UTC): | 2022-06-08 12:00:00
Maximum time (UTC): | 2022-06-08 14:41:53
Number of series | 623215
Persistent Blocks Metadata
----------------------------
Minimum time (UTC): | 2022-06-07 19:58:43
Maximum time (UTC): | 2022-06-08 12:00:00
Total number of blocks | 4
Total number of samples | 902569130
Total number of series | 2588592
Total size | 1254129156
Not a bad idea. I probably won't have time for it for the next few weeks. Will you be interested in putting a PR together?
I would like to make a contribution here, just not sure when I can carve out the time. I will keep you posted.