homeworld icon indicating copy to clipboard operation
homeworld copied to clipboard

Prometheus can fail to start if it didn't exit cleanly

Open cryslith opened this issue 6 years ago • 2 comments

See https://github.com/prometheus/tsdb/issues/178

We could either pass the --storage.tsdb.no-lockfile to disable locking entirely (but would lead to data corruption if we somehow ran two prometheus instances at once on the same supervisor node), or we can just resolve to remove the lockfile manually whenever this occurs.

cryslith avatar Nov 29 '19 04:11 cryslith

For reference, the lock file to remove is /var/lib/prometheus/data/lock.

This seems to happen pretty often when I reboot the cluster. Is there any reason why we might not want to disable the lockfile? (If prometheus is entirely managed by systemd, there shouldn't be a case where two instances run at the same time, right?)

krawthekrow avatar Jan 29 '20 21:01 krawthekrow

I can't think of a reason that we would end up having two instances at the same time, so go for it.

celskeggs avatar Jan 29 '20 21:01 celskeggs