helm-charts [kube-prometheus-stack] livenessProbe not restarting Prometheus

[kube-prometheus-stack] livenessProbe not restarting Prometheus

Open reefland opened this issue 2 years ago • 1 comments

Describe the bug a clear and concise description of what the bug is.

Background: Possible network glitch causes an issue with the PVC resulting in Prometheus having a fatal error: fault. At this point Prometheus is down for several hours until I can simply restart the Pod.

The issue is the livenessProbe already defined on the pod does not see that Prometheus is down and restart the pod. It is not clear if how the livenessProbe are defined is this project or an upstream issue.

I observe in the Prometheus container that the /prometheus is no longer mounted. I'd like to add an additional livenessProbe (startupProbe as well maybe) that checks that path but I don't see a way using values.yaml to make that happen. Something such as:

    livenessProbe:
      exec:
        command:
        - cat
        - /prometheus/lock
      initialDelaySeconds: 30
      periodSeconds: 60

What's your helm version?

version.BuildInfo{Version:"v3.9.2", GitCommit:"1addefbfe665c350f4daf868a9adc5600cc064fd", GitTreeState:"clean", GoVersion:"go1.17.12"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3+k3s1", GitCommit:"990ba0e88c90f8ed8b50e0ccd375937b841b176e", GitTreeState:"clean", BuildDate:"2022-07-19T01:10:03Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3+k3s1", GitCommit:"990ba0e88c90f8ed8b50e0ccd375937b841b176e", GitTreeState:"clean", BuildDate:"2022-07-19T01:10:03Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

kube-prometheus-stack

What's the chart version?

38.0.3

What happened?

Pod runs great for several days:

ts=2022-07-29T03:00:31.630Z caller=db.go:1292 level=info component=tsdb msg="Deleting obsolete block" block=01G939QXET9D0NZB0PCZ0W4ZSF
ts=2022-07-29T03:00:31.664Z caller=db.go:1292 level=info component=tsdb msg="Deleting obsolete block" block=01G93QFB4R0N6Y75H1T3QCE9SQ
ts=2022-07-29T03:00:31.704Z caller=db.go:1292 level=info component=tsdb msg="Deleting obsolete block" block=01G93GKMPSSFK57DZA4ZWRQ3B2
ts=2022-07-29T05:00:11.138Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1659060000016 maxt=1659067200000 ulid=01G9456TESQ9KV4MJ625R2G4HR duration=8.745389154s
ts=2022-07-29T05:00:11.554Z caller=head.go:840 level=info component=tsdb msg="Head GC completed" duration=412.505174ms
ts=2022-07-29T05:00:11.572Z caller=checkpoint.go:97 level=info component=tsdb msg="Creating checkpoint" from_segment=219 to_segment=222 mint=1659067200000
ts=2022-07-29T05:00:16.135Z caller=head.go:1009 level=info component=tsdb msg="WAL checkpoint complete" first=219 last=222 duration=4.563792422s
ts=2022-07-29T07:00:11.270Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1659067200048 maxt=1659074400000 ulid=01G94C2HPSWX0VQCA56CQHP6FK duration=8.877545992s
ts=2022-07-29T07:00:11.743Z caller=head.go:840 level=info component=tsdb msg="Head GC completed" duration=468.27928ms
ts=2022-07-29T07:00:11.757Z caller=checkpoint.go:97 level=info component=tsdb msg="Creating checkpoint" from_segment=223 to_segment=226 mint=1659074400000
ts=2022-07-29T07:00:16.163Z caller=head.go:1009 level=info component=tsdb msg="WAL checkpoint complete" first=223 last=226 duration=4.40617176s
ts=2022-07-29T09:00:11.112Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1659074400117 maxt=1659081600000 ulid=01G94JY8YRYQTY0951M56MM7S9 duration=8.720217942s
ts=2022-07-29T09:00:11.599Z caller=head.go:840 level=info component=tsdb msg="Head GC completed" duration=482.392016ms
ts=2022-07-29T09:00:11.619Z caller=checkpoint.go:97 level=info component=tsdb msg="Creating checkpoint" from_segment=227 to_segment=230 mint=1659081600000
ts=2022-07-29T09:00:15.481Z caller=head.go:1009 level=info component=tsdb msg="WAL checkpoint complete" first=227 last=230 duration=3.861726711s

Unclear what the glitch is, but after this error the PVC is not mounted in the Pod:

ts=2022-07-29T09:07:17.539Z caller=compact.go:558 level=error component=tsdb msg="removed tmp folder after failed compaction" err="open /prometheus: input/output error"
ts=2022-07-29T09:07:17.540Z caller=block.go:227 level=error component=tsdb msg="remove tmp file" err="open /prometheus/01G939RB4M0YWH335QM9HXWT1W: input/output error"
ts=2022-07-29T09:07:17.540Z caller=db.go:829 level=error component=tsdb msg="compaction failed" err="compact [/prometheus/01G939RB4M0YWH335QM9HXWT1W /prometheus/01G93YBH299WK61Y8N8CE22FE3 /prometheus/01G93YB36RJKRTABDBAWE3937R /prometheus/01G9456TESQ9KV4MJ625R2G4HR /prometheus/01G94C2HPSWX0VQCA56CQHP6FK]: 7 errors: sync /prometheus/01G94JYNQS08SP1A54NR271TBZ.tmp-for-creation/chunks/000002: input/output error; sync /prometheus/01G94JYNQS08SP1A54NR271TBZ.tmp-for-creation/index_tmp_p: input/output error; setting compaction failed for block: /prometheus/01G939RB4M0YWH335QM9HXWT1W: open /prometheus/01G939RB4M0YWH335QM9HXWT1W/meta.json.tmp: input/output error; setting compaction failed for block: /prometheus/01G93YBH299WK61Y8N8CE22FE3: open /prometheus/01G93YBH299WK61Y8N8CE22FE3/meta.json.tmp: input/output error; setting compaction failed for block: /prometheus/01G93YB36RJKRTABDBAWE3937R: open /prometheus/01G93YB36RJKRTABDBAWE3937R/meta.json.tmp: input/output error; setting compaction failed for block: /prometheus/01G9456TESQ9KV4MJ625R2G4HR: open /prometheus/01G9456TESQ9KV4MJ625R2G4HR/meta.json.tmp: input/output error; setting compaction failed for block: /prometheus/01G94C2HPSWX0VQCA56CQHP6FK: open /prometheus/01G94C2HPSWX0VQCA56CQHP6FK/meta.json.tmp: input/output error"
ts=2022-07-29T09:07:17.540Z caller=manager.go:659 level=warn component="rule manager" file=/etc/prometheus/rules/prometheus-prometheus-prometheus-rulefiles-0/monitoring-prometheus-k8s.rules-4a1996ef-c203-4131-aed4-e20400bc4aec.yaml group=k8s.rules name=cluster:namespace:pod_memory:active:kube_pod_container_resource_requests index=5 msg="Rule sample appending failed" err="write to WAL: log samples: write /prometheus/wal/00000234: input/output error"
unexpected fault address 0x7f6728448400
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x7f6728448400 pc=0x46c5ae]

The pod will then endless spit out go messages, and not get restarted.

goroutine 1697 [running]:
runtime.throw({0x2f1e844?, 0x27?})
	/usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc0020aec10 sp=0xc0020aebe0 pc=0x4384b1
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:815 +0x125 fp=0xc0020aec60 sp=0xc0020aec10 pc=0x44e525
runtime.memmove()
	/usr/local/go/src/runtime/memmove_amd64.s:372 +0x42e fp=0xc0020aec68 sp=0xc0020aec60 pc=0x46c5ae
github.com/prometheus/prometheus/promql.ActiveQueryTracker.Insert({{0x7f6728448000, 0x4e21, 0x4e21}, 0xc000553a00, {0x388c300, 0xc00021d310}, 0x14}, {0x38abf48, 0xc0409b67b0}, {0xc05596c280, ...})
	/app/promql/query_logger.go:193 +0x190 fp=0xc0020aed10 sp=0xc0020aec68 pc=0x23543b0
github.com/prometheus/prometheus/promql.(*ActiveQueryTracker).Insert(0x38abf48?, {0x38abf48?, 0xc0409b67b0?}, {0xc05596c280?, 0xc04add8550?})
	<autogenerated>:1 +0xbf fp=0xc0020aedb0 sp=0xc0020aed10 pc=0x2358e3f
github.com/prometheus/prometheus/promql.(*Engine).exec(0xc00064a900, {0x38abf48, 0xc0409b67b0}, 0xc0005e2690)
	/app/promql/engine.go:573 +0x30f fp=0xc0020af088 sp=0xc0020aedb0 pc=0x2330b8f
github.com/prometheus/prometheus/promql.(*query).Exec(0xc0005e2690, {0x38abf48, 0xc0409b65a0})
	/app/promql/engine.go:197 +0x1f5 fp=0xc0020af1b8 sp=0xc0020af088 pc=0x232eb95
github.com/prometheus/prometheus/rules.EngineQueryFunc.func1({0x38abf48, 0xc0409b65a0}, {0xc05596c280?, 0x0?}, {0x0?, 0xee06ab345b799de1?, 0x0?})
	/app/rules/manager.go:192 +0x7c fp=0xc0020af228 sp=0xc0020af1b8 pc=0x2365fbc
github.com/prometheus/prometheus/rules.(*RecordingRule).Eval(0xc03f076f80, {0x38abf48, 0xc0409b65a0}, {0x3883984?, 0x16?, 0x0?}, 0xc0006ba1e0, 0x0?, 0x0)
	/app/rules/recording.go:75 +0xc4 fp=0xc0020af380 sp=0xc0020af228 pc=0x2370804
github.com/prometheus/prometheus/rules.(*Group).Eval.func1({0x38abf48, 0xc04ed47020}, 0xc03eaab0e0, {0xc04a091740?, 0xc04a091740?, 0x0?}, 0xc0020afc68, 0x6, {0x38bc840, 0xc03f076f80})
	/app/rules/manager.go:624 +0x457 fp=0xc0020afc10 sp=0xc0020af380 pc=0x2369e57
github.com/prometheus/prometheus/rules.(*Group).Eval(0xc03eaab0e0, {0x38abf48, 0xc04ed47020}, {0x3883984?, 0x4faeaa0?, 0x0?})
	/app/rules/manager.go:707 +0x20d fp=0xc0020afcc8 sp=0xc0020afc10 pc=0x236994d
github.com/prometheus/prometheus/rules.(*Group).run.func1()
	/app/rules/manager.go:362 +0x11b fp=0xc0020afd60 sp=0xc0020afcc8 pc=0x2367b3b
github.com/prometheus/prometheus/rules.(*Group).run(0xc03eaab0e0, {0x38abed8, 0xc0000c0000})
	/app/rules/manager.go:440 +0x829 fp=0xc0020aff80 sp=0xc0020afd60 pc=0x23671a9
github.com/prometheus/prometheus/rules.(*Manager).Update.func1(0xc001c695f0?)
	/app/rules/manager.go:1006 +0xaf fp=0xc0020affc8 sp=0xc0020aff80 pc=0x236f0af
github.com/prometheus/prometheus/rules.(*Manager).Update.func4()
	/app/rules/manager.go:1007 +0x2a fp=0xc0020affe0 sp=0xc0020affc8 pc=0x236efca
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0020affe8 sp=0xc0020affe0 pc=0x46b301
created by github.com/prometheus/prometheus/rules.(*Manager).Update
	/app/rules/manager.go:996 +0x3ad

goroutine 1 [chan receive, 489 minutes]:
github.com/oklog/run.(*Group).Run(0xc00097baa0)
	/go/pkg/mod/github.com/oklog/[email protected]/group.go:43 +0x7c
main.main()
	/app/cmd/prometheus/main.go:1111 +0x9545

goroutine 85 [select]:
github.com/prometheus/prometheus/discovery/legacymanager.(*Manager).sender(0xc0006417c0)
	/app/discovery/legacymanager/manager.go:235 +0xd7
created by github.com/prometheus/prometheus/discovery/legacymanager.(*Manager).Run
	/app/discovery/legacymanager/manager.go:142 +0x5a

goroutine 50 [select, 2 minutes]:
go.opencensus.io/stats/view.(*worker).start(0xc0001a8000)
	/go/pkg/mod/[email protected]/stats/view/worker.go:276 +0xad
created by go.opencensus.io/stats/view.init.0
	/go/pkg/mod/[email protected]/stats/view/worker.go:34 +0x8d

goroutine 110 [select, 2 minutes]:
github.com/prometheus/prometheus/util/logging.(*Deduper).run(0xc0004dd900)
	/app/util/logging/dedupe.go:75 +0xe8
created by github.com/prometheus/prometheus/util/logging.Dedupe
	/app/util/logging/dedupe.go:61 +0x10a

goroutine 117 [select, 2 minutes]:
github.com/prometheus/prometheus/storage/remote.(*WriteStorage).run(0xc000641680)
	/app/storage/remote/write.go:107 +0xd7
created by github.com/prometheus/prometheus/storage/remote.NewWriteStorage
	/app/storage/remote/write.go:99 +0x485

	goroutine 257 [syscall, 489 minutes]:
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:151 +0x2f
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/usr/local/go/src/os/signal/signal.go:151 +0x2a

goroutine 246 [select, 489 minutes]:
main.main.func7()
	/app/cmd/prometheus/main.go:805 +0xa8
github.com/oklog/run.(*Group).Run.func1({0xc000b7bf00?, 0xc000885d28?})
	/go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x2f
created by github.com/oklog/run.(*Group).Run
	/go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0x22a

goroutine 247 [chan receive, 489 minutes]:
github.com/prometheus/prometheus/discovery/legacymanager.(*Manager).Run(0xc000641720)
	/app/discovery/legacymanager/manager.go:143 +0x77
main.main.func9()
	/app/cmd/prometheus/main.go:826 +0x38
github.com/oklog/run.(*Group).Run.func1({0xc000954cf0?, 0xc000952320?})
	/go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x2f
created by github.com/oklog/run.(*Group).Run
	/go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0x22a

goroutine 248 [chan receive, 489 minutes]:
github.com/prometheus/prometheus/discovery/legacymanager.(*Manager).Run(0xc0006417c0)
	/app/discovery/legacymanager/manager.go:143 +0x77
main.main.func11()
	/app/cmd/prometheus/main.go:840 +0x38
github.com/oklog/run.(*Group).Run.func1({0xc000954d20?, 0xc000952360?})
	/go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x2f
created by github.com/oklog/run.(*Group).Run
	/go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0x22a

I restart the Pod manually changing replicas to zero and back to 1 and pod restart is successful, PVC mounts. Prometheus resolves whatever data issue it has:

ts=2022-07-29T12:27:32.854Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1658682813102 maxt=1658685600000 ulid=01G8RXFXZHTA6GD33DMGH3C8HP
ts=2022-07-29T12:27:32.861Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1658685600000 maxt=1658750400000 ulid=01G8TXZ02RCN79K5T9H2RM9KT5
ts=2022-07-29T12:27:32.862Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1658750400031 maxt=1658815200000 ulid=01G8X2MNB5CQDSACP1SQJ24DB2
ts=2022-07-29T12:27:32.862Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1658815200030 maxt=1658880000000 ulid=01G8Z0E7J00QS895CPDK82FSX2
ts=2022-07-29T12:27:32.866Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1658880000202 maxt=1658944800000 ulid=01G90Y7PJBS32HYYNC474XPTK9
ts=2022-07-29T12:27:32.867Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1658944800003 maxt=1659009600000 ulid=01G92W18AW2MFC6BNT1DY9MTKD
ts=2022-07-29T12:27:32.867Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1659009600000 maxt=1659031200000 ulid=01G939RB4M0YWH335QM9HXWT1W
ts=2022-07-29T12:27:32.892Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1659052800030 maxt=1659060000000 ulid=01G93YB36RJKRTABDBAWE3937R
ts=2022-07-29T12:27:32.893Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1659031200000 maxt=1659052800000 ulid=01G93YBH299WK61Y8N8CE22FE3
ts=2022-07-29T12:27:32.893Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1659060000016 maxt=1659067200000 ulid=01G9456TESQ9KV4MJ625R2G4HR
ts=2022-07-29T12:27:32.904Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1659067200048 maxt=1659074400000 ulid=01G94C2HPSWX0VQCA56CQHP6FK
ts=2022-07-29T12:27:32.926Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1659074400117 maxt=1659081600000 ulid=01G94JY8YRYQTY0951M56MM7S9
ts=2022-07-29T12:27:32.957Z caller=db.go:777 level=info component=tsdb msg="Found and deleted tmp block dir" dir=/prometheus/01G94JYNQS08SP1A54NR271TBZ.tmp-for-creation
ts=2022-07-29T12:27:32.957Z caller=dir_locker.go:77 level=warn component=tsdb msg="A lockfile from a previous execution already existed. It was replaced" file=/prometheus/lock
ts=2022-07-29T12:27:34.395Z caller=head.go:493 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2022-07-29T12:27:35.109Z caller=head.go:520 level=error component=tsdb msg="Loading on-disk chunks failed" err="iterate on on-disk chunks: corruption in head chunk file /prometheus/chunks_head/000064: head chunk file doesn't include enough bytes to read the chunk header - required:16777226, available:16777216, file:64"
ts=2022-07-29T12:27:35.111Z caller=head.go:689 level=info component=tsdb msg="Deleting mmapped chunk files"
ts=2022-07-29T12:27:35.112Z caller=head.go:699 level=info component=tsdb msg="Deletion of mmap chunk files successful, reattempting m-mapping the on-disk chunks"
ts=2022-07-29T12:27:35.348Z caller=head.go:536 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=952.705267ms
ts=2022-07-29T12:27:35.348Z caller=head.go:542 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2022-07-29T12:28:17.874Z caller=head.go:578 level=info component=tsdb msg="WAL checkpoint loaded"
ts=2022-07-29T12:28:19.694Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=231 maxSegment=235
ts=2022-07-29T12:28:22.135Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=232 maxSegment=235
ts=2022-07-29T12:28:23.572Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=233 maxSegment=235
ts=2022-07-29T12:28:24.901Z caller=db.go:752 level=warn component=tsdb msg="Encountered WAL read error, attempting repair" err="read records: corruption in segment /prometheus/wal/00000234 at 12517376: unexpected checksum 4b5b350, expected 9db6b9d7"
ts=2022-07-29T12:28:24.901Z caller=wal.go:363 level=warn component=tsdb msg="Starting corruption repair" segment=234 offset=12517376
ts=2022-07-29T12:28:24.901Z caller=wal.go:371 level=warn component=tsdb msg="Deleting all segments newer than corrupted segment" segment=234
ts=2022-07-29T12:28:24.901Z caller=wal.go:393 level=warn component=tsdb msg="Rewrite corrupted segment" segment=234
ts=2022-07-29T12:28:25.071Z caller=main.go:993 level=info fs_type=XFS_SUPER_MAGIC
ts=2022-07-29T12:28:25.071Z caller=main.go:996 level=info msg="TSDB started"
ts=2022-07-29T12:28:25.071Z caller=main.go:1177 level=info msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
ts=2022-07-29T12:28:25.088Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.095Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.095Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.095Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.096Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.096Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.096Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.096Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.097Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.097Z caller=kubernetes.go:325 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.097Z caller=kubernetes.go:325 level=info component="discovery manager notify" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-07-29T12:28:25.203Z caller=main.go:1214 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=131.90944ms db_storage=2.495µs remote_storage=24.316µs web_handler=1.232µs query_engine=1.212µs scrape=8.822757ms scrape_sd=9.670066ms notify=46.638µs notify_sd=203.632µs rules=106.013283ms tracing=23.414µs
ts=2022-07-29T12:28:25.203Z caller=main.go:957 level=info msg="Server is ready to receive web requests."

What you expected to happen?

I expect the Pod error condition to be detected by the livenessProb(s) defined and attempt to restart the Pod per policy.

How to reproduce it?

I'm unable to reproduce on demand.

Enter the changed values of values.yaml?

    # Prometheus values
    prometheus:
      enabled: true
      prometheusSpec:
        ## Prometheus StorageSpec for persistent data
        ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/storage.md
        storageSpec:
          volumeClaimTemplate:
            spec:
              storageClassName: freenas-iscsi-csi
              accessModes: 
                - ReadWriteOnce
              resources:
                requests:
                  storage: 50Gi
        retention: 21d
        retentionSize: 40GB
        externalUrl: /prometheus

Enter the command that you execute and failing/misfunctioning.

Deployed via ArgoCD

Anything else we need to know?

I understand the underlying issue with the PVC needs to be resolved. My question is not around that, my issue is specifically why the Pod is not restarted when a failure happens.

Jul 29 '22 15:07 reefland

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Sep 20 '22 17:09 stale[bot]

This issue is being automatically closed due to inactivity.

Oct 12 '22 10:10 stale[bot]

helm-charts helm-charts copied to clipboard

[kube-prometheus-stack] livenessProbe not restarting Prometheus

Describe the bug a clear and concise description of what the bug is.

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?

helm-charts
helm-charts copied to clipboard