alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

Add support for updating existing Slack messages

Open kruchkov-alexandr opened this issue 1 month ago • 5 comments

Slack: Update existing messages instead of creating new ones

TL;DR

Before image

After image

Add update_message config option to Slack notifier. When enabled, updates existing messages in place instead of creating new ones for each alert status change.

Current behavior creates multiple messages per alert group:

#alerts channel:
[10:00] 🔥 Alert: HighCPU - FIRING
[10:05] 🔥 Alert: HighCPU - FIRING  
[10:10] ✅ Alert: HighCPU - RESOLVED

3 messages → clutters channel

With this PR:

#alerts channel:
[10:00, edited 10:10] ✅ Alert: HighCPU - RESOLVED

1 message → clean channel

How

Implementation:

  • New MetadataStore tracks message_ts and channel_id per alert group (in-memory)
  • Slack notifier checks store before sending
  • Auto-switches between chat.postMessage (new) and chat.update (existing)
  • Stores channel_id from first response (required by Slack API for updates)

Flow:

Alert → Check MetadataStore → Found? → chat.update
                           → Not Found? → chat.postMessage → Store ts & channel_id

Configuration

receivers:
  - name: 'slack-team'
    slack_configs:
      - send_resolved: true              # Required!
        update_message: true              # New option (default: false)
        api_url: 'https://slack.com/api/chat.postMessage'
        http_config:
          authorization:
            credentials_file: '/etc/alertmanager/slack-token'
            # OR
            # credentials: 'xoxb-your-bot-token'
        channel: '#alerts'
        title: '{{ .GroupLabels.alertname }} - {{ .Status | toUpper }}'
        text: |
          {{ if eq .Status "firing" }}🔥{{ else }}✅{{ end }} {{ .Alerts | len }} alert(s)
          {{ range .Alerts }}• {{ .Annotations.summary }}{{ end }}

Requirements:

  • Bot token (not webhook URL) with chat:write scope
  • send_resolved: true must be set
  • update_message: true must be set
  • Bot invited to target channel

Testing

# Build
make build

# Run
./alertmanager --log.level=debug --config.file=examples/slack-update-messages.yml

# Fire alert
curl -X POST http://localhost:9093/api/v2/alerts -H 'Content-Type: application/json' -d '[{
  "labels": {"alertname": "TestAlert", "severity": "warning"},
  "annotations": {"summary": "Test alert"}
}]'

# Wait 10-15s, check Slack → NEW message appears

# Resolve alert
curl -X POST http://localhost:9093/api/v2/alerts -H 'Content-Type: application/json' -d '[{
  "labels": {"alertname": "TestAlert", "severity": "warning"},
  "annotations": {"summary": "Test alert"},
  "endsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
}]'

# Wait 10-15s, check Slack → SAME message updates (not new one!)

Expected logs:

# First notification:
msg="no existing message found - will create NEW"
msg="saved Slack message_ts for future updates" message_ts="..." channel_id="C01234567"

# Second notification:
msg="FOUND existing Slack message - will UPDATE" message_ts="..."
msg="using chat.update endpoint for message update"

Limitations

Current implementation:

  • Webhook doesn't work, Slack App only!
  • In-memory storage (lost on restart) - acceptable for v1, persistence can be added later
  • No HA sync yet - each instance has own cache
  • Protobuf not regenerated - using separate store instead

By design:

  • Requires bot token (webhook URLs don't support updates - Slack API limitation)
  • Channel ID required for updates (extracted from first response)

Backward Compatibility

✅ Opt-in feature, defaults to false
✅ No changes to existing configs
✅ No breaking changes

kruchkov-alexandr avatar Nov 05 '25 12:11 kruchkov-alexandr

I'm not intending to block this change, but I do want to mention these limitations are the reasons @gotjosh and I didn't implement this in the past (although discussed):

In-memory storage (lost on restart) - acceptable for v1, persistence can be added later No HA sync yet - each instance has own cache

We felt these limitations were too severe for the feature to make stable because it makes the feature behave too unpredictably.

However, it sounds like you may also have a plan about how to address those limitations going forward?

grobinson-grafana avatar Nov 05 '25 13:11 grobinson-grafana

I'm not intending to block this change, but I do want to mention these limitations are the reasons @gotjosh and I didn't implement this in the past (although discussed):

In-memory storage (lost on restart) - acceptable for v1, persistence can be added later No HA sync yet - each instance has own cache

We felt these limitations were too severe for the feature to make stable because it makes the feature behave too unpredictably.

However, it sounds like you may also have a plan about how to address those limitations going forward?

Thank you for the review! You're absolutely right about these limitations.

I actually have a solution ready for both issues. But I want to ask what you'd prefer:

Option A: Ship this as v1, iterate later Keep the current in-memory approach. Yes, it has limitations (no persistence, no HA sync), but it works well for single-instance setups. Then I'll do a follow-up PR with the full solution.

Option B: Go all the way in this PR I can integrate metadata directly into nflog - turns out there's already a metadata field in nflog.proto that's perfect for this! Just need to:

  • Regenerate the .pb.go files (proper serialization)
  • Wire it through DedupStage -> context -> Slack -> SetNotifiesStage -> nflog
  • Get persistence and HA sync for free via existing nflog infrastructure

I've tested Option B locally - metadata survives restarts and updates work correctly after restart. The changes are pretty clean.

What's your preference? I'm happy to go either way - just want to align with how you prefer to merge features. Thanks.

kruchkov-alexandr avatar Nov 05 '25 15:11 kruchkov-alexandr

Hi! Also not looking to block this, but just a drive by comment: We have an internal patch that adds a generic key/value store to the nflog as well. We've been running our production cluster with that patch for ~2 years now. We even use it for this exact purpose!

One thing we found is that wiring it through all the notifiers is a little ugly. We ended up changing the signature of Notifier.Notify a little bit.

If there's interest, we'd be happy to upstream that. Our implementation is pretty much compatible with this PR - in the proto it's a string -> string | int64 | double.

Spaceman1701 avatar Nov 05 '25 20:11 Spaceman1701

I think it would make sense to plan this as a more generic fundamental change so potentially more notifier integrations can benefit from it. @Spaceman1701 no rush but it would be great if you can contribute that feature from your existing implementation.

siavashs avatar Nov 11 '25 14:11 siavashs

I think it would make sense to plan this as a more generic fundamental change so potentially more notifier integrations can benefit from it. @Spaceman1701 no rush but it would be great if you can contribute that feature from your existing implementation.

Agreed, I think a generic implementation is a good place to start from. I can untangle our internal version by next week, I think. It'll probably take a few iterations of review before we have something that we're happy with for the upstream API.

Spaceman1701 avatar Nov 12 '25 15:11 Spaceman1701

Related issue: https://github.com/grafana/grafana/issues/79939

benoittgt avatar Dec 04 '25 13:12 benoittgt