alertmanager
alertmanager copied to clipboard
Alert manager possible dead lock
Description
Recently we noticed a dead lock issue happened in our running alert managers. Reference:
- dispather: https://github.com/prometheus/alertmanager/blob/main/dispatch/dispatch.go#L143
- mem.Alerts.Subscribe: https://github.com/prometheus/alertmanager/blob/1da134aa30c81e656c5156df1499a77d5df92269/provider/mem/mem.go#L151
- claim store.Alerts.mtx: https://github.com/prometheus/alertmanager/blob/1da134aa30c81e656c5156df1499a77d5df92269/provider/mem/mem.go#L157 https://github.com/prometheus/alertmanager/blob/1da134aa30c81e656c5156df1499a77d5df92269/store/store.go#L119
- gc: https://github.com/prometheus/alertmanager/blob/1da134aa30c81e656c5156df1499a77d5df92269/store/store.go#L72
- mem.gc.callback: https://github.com/prometheus/alertmanager/blob/1da134aa30c81e656c5156df1499a77d5df92269/provider/mem/mem.go#L112
Proposal fix
- Have subscribe function here https://github.com/prometheus/alertmanager/blob/1da134aa30c81e656c5156df1499a77d5df92269/provider/mem/mem.go#L151 to claim store.Alerts.mux first before claim mem.Alerts.mux
Is this something we can close now with #3715 getting merged?