AutoGPT icon indicating copy to clipboard operation
AutoGPT copied to clipboard

feat(platform): Add AllQuiet alert integration alongside Discord alerts

Open ntindle opened this issue 2 months ago β€’ 14 comments

  • Added system_alert method to NotificationManager that sends both Discord and AllQuiet alerts
  • Implemented correlation IDs for all system alerts to prevent duplicate incidents:
    • Late executions: Based on threshold, count, and affected users
    • Block errors: Based on affected blocks and date
    • Balance alerts: Based on user ID
    • Retry failures: Based on function, context, and error type
  • Updated all alert locations to use NotificationManager.system_alert() method
  • Added AllQuiet webhook URL configuration in settings
  • Maintained backward compatibility with existing Discord alerts

AllQuiet alerts are only sent when correlation_id is provided, ensuring controlled rollout. Severity levels (critical/warning/minor) and extra attributes provide better incident management and debugging context.

πŸ€– Generated with Claude Code

Co-Authored-By: Claude [email protected]

Changes πŸ—οΈ

Checklist πŸ“‹

For code changes:

  • [ ] I have clearly listed my changes in the PR description
  • [ ] I have made a test plan
  • [ ] I have tested my changes according to the test plan:
    • [ ] ...
Example test plan
  • [ ] Create from scratch and execute an agent with at least 3 blocks
  • [ ] Import an agent from file upload, and confirm it executes correctly
  • [ ] Upload agent to marketplace
  • [ ] Import an agent from marketplace and confirm it executes correctly
  • [ ] Edit an agent from monitor, and confirm it executes correctly

For configuration changes:

  • [ ] .env.default is updated or already compatible with my changes
  • [ ] docker-compose.yml is updated or already compatible with my changes
  • [ ] I have included a list of my configuration changes in the PR description (under Changes)
Examples of configuration changes
  • Changing ports
  • Adding new services that need to communicate with each other
  • Secrets or environment variable changes
  • New or infrastructure changes such as databases

ntindle avatar Oct 21 '25 20:10 ntindle

Deploy Preview for auto-gpt-docs-dev canceled.

Name Link
Latest commit 66c2260256cb0fd01fa3a656a7f197ac0a0ce9da
Latest deploy log https://app.netlify.com/projects/auto-gpt-docs-dev/deploys/692748288c45a7000837f73b

netlify[bot] avatar Oct 21 '25 20:10 netlify[bot]

Deploy Preview for auto-gpt-docs canceled.

Name Link
Latest commit 66c2260256cb0fd01fa3a656a7f197ac0a0ce9da
Latest deploy log https://app.netlify.com/projects/auto-gpt-docs/deploys/692748283798770008e45662

netlify[bot] avatar Oct 21 '25 20:10 netlify[bot]

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

github-actions[bot] avatar Oct 21 '25 20:10 github-actions[bot]

[!IMPORTANT]

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

[!NOTE]

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches
πŸ§ͺ Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch ntindle/systemallquietalerts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 21 '25 20:10 coderabbitai[bot]

Thanks for this PR adding AllQuiet alert integration alongside Discord alerts. The implementation looks solid with correlation IDs to prevent duplicate incidents, and the code changes are well-structured.

However, I need to flag that the PR checklist is completely unchecked. Before we can merge this PR, please complete the checklist in the PR description:

  • Check that you've clearly listed your changes (which you have in the PR description)
  • Confirm you have a test plan
  • Verify you've tested according to the plan
  • For the configuration changes (adding allquiet_webhook_url), make sure .env.default is updated and that you've listed this configuration change in the Changes section

Once you've addressed the checklist items, this PR should be ready for approval.

AutoGPT-Agent avatar Oct 21 '25 20:10 AutoGPT-Agent

Here's the code health analysis summary for commits bdb94a3..66c2260. View details on DeepSourceΒ β†—.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource JavaScript LogoJavaScriptβœ…Β SuccessView CheckΒ β†—
DeepSource Python LogoPythonβœ…Β SuccessView CheckΒ β†—

πŸ’‘ If you’re a repository administrator, you can configure the quality gates from the settings.

deepsource-io[bot] avatar Oct 21 '25 20:10 deepsource-io[bot]

Conflicts have been resolved! πŸŽ‰ A maintainer will review the pull request shortly.

github-actions[bot] avatar Oct 22 '25 18:10 github-actions[bot]

Thank you for implementing the AllQuiet alert integration alongside Discord alerts! The changes look well-structured and consistent across the codebase.

Your implementation of correlation IDs for different alert types is thorough, and I appreciate the backward compatibility with existing Discord alerts.

Before this can be merged:

  • Please complete the PR checklist. Currently, none of the items are checked off.

    • At minimum, the code changes checklist items need to be completed
    • Please include a test plan showing how you've verified the AllQuiet integration works
  • You've added the allquiet_webhook_url configuration item correctly in settings.py, but this should be noted in the configuration changes section of your PR description.

The code implementation itself looks good, but the PR process requirements need to be addressed before this can be merged. Please update the PR description with the completed checklist and any additional configuration changes information.

AutoGPT-Agent avatar Oct 22 '25 18:10 AutoGPT-Agent

Thanks for implementing this AllQuiet alert integration! This will definitely help improve our incident management capabilities.

However, there are a few things that need to be addressed before we can merge this PR:

  1. The PR checklist isn't filled out. Please complete the checklist items, particularly:

    • Clearly listing your changes in the PR description (the "Changes" section is currently empty)
    • Making and documenting a test plan for how you've verified this functionality
  2. Since you're adding a new configuration option (allquiet_webhook_url), please:

    • Confirm if .env.default needs updating
    • Mention the configuration changes in your PR description
  3. It would be helpful to include some details about:

    • How you've tested this integration
    • Any considerations for rolling this out in production
    • Whether existing alerts might be affected

The code changes themselves look good! The implementation with correlation IDs for deduplication and the extra context attributes will be very helpful for incident management.

AutoGPT-Agent avatar Oct 22 '25 18:10 AutoGPT-Agent

Thank you for your PR implementing AllQuiet alerts alongside Discord alerts. The implementation looks well-structured with correlation IDs and severity levels to improve incident management.

However, there are a couple of items that need to be addressed before this can be merged:

  1. The PR checklist is incomplete - none of the checkboxes have been checked. Please fill out the checklist to confirm you've tested your changes according to a test plan.

  2. You've added a new configuration field allquiet_webhook_url to the Secrets class, but there's no mention of this in the Changes section of your PR description. Please:

    • Update the PR description to list this configuration change
    • Confirm that .env.default is compatible with this change
    • Confirm that docker-compose.yml is compatible with this change

Once you've addressed these items, the PR should be ready for another review.

AutoGPT-Agent avatar Oct 22 '25 18:10 AutoGPT-Agent

Thank you for adding the AllQuiet alert integration! This looks like a valuable addition to provide better incident management alongside the existing Discord alerts.

However, before we can approve this PR:

  1. Please complete the PR checklist by checking all the applicable boxes. For code changes, you need to:

    • Make sure you have clearly listed your changes in the PR description (which you've done well)
    • Create a test plan
    • Confirm you've tested your changes according to the test plan
  2. For configuration changes, since you're adding a new allquiet_webhook_url setting, please:

    • Confirm that .env.default and docker-compose.yml are updated or compatible
    • Include the configuration changes in the PR description under the 'Changes' section

The code implementation looks thorough, with correlation IDs for different alert types and proper severity levels. Once you complete the checklist items, this PR should be ready for approval.

AutoGPT-Agent avatar Oct 22 '25 20:10 AutoGPT-Agent

Thank you for this PR implementing AllQuiet alert integration alongside Discord alerts! The implementation looks comprehensive with correlation IDs for different alert types, severity levels, and maintaining backward compatibility.

However, there's one important issue that needs to be addressed:

  • The PR checklist is completely empty. Please either:
    • Complete the checklist items (especially the test plan section), or
    • If certain sections don't apply, explicitly note that in the PR description

Otherwise, the implementation appears solid with good additions like:

  • The new system_alert method that handles both Discord and AllQuiet notifications
  • Structured correlation IDs to prevent duplicate incidents
  • Severity levels for better incident management
  • Added AllQuiet webhook URL configuration

Once the checklist is properly addressed, this PR should be ready for review.

AutoGPT-Agent avatar Oct 28 '25 15:10 AutoGPT-Agent

Thank you for this PR implementing the AllQuiet alert integration alongside Discord alerts! The changes look well-structured and the description clearly explains what you've added.

However, before this can be merged, please complete the checklist in your PR description. Since this PR contains code changes, you need to:

  1. Check the box indicating you've listed your changes (which you have done well in the description)
  2. Create a test plan and check that box
  3. Check the box indicating you've tested your changes according to the plan

Once you've completed the checklist, this PR should be ready for another review. The code changes themselves look appropriate and match what's described in the PR title and description.

AutoGPT-Agent avatar Oct 28 '25 17:10 AutoGPT-Agent

Thank you for your PR implementing AllQuiet alert integration! The code changes look well structured and maintain backward compatibility as described.

Before this PR can be approved, please address these items:

  1. Complete the PR checklist: Currently, none of the checklist items are checked off. Please complete the relevant sections or remove them if not applicable.

  2. Document configuration changes: You've added a new configuration setting allquiet_webhook_url but haven't explicitly listed this in the "Changes" section of your PR description. Please update the description to include this configuration addition.

  3. Test plan: Add a test plan that demonstrates how you verified the AllQuiet integration works correctly, particularly how you tested the correlation ID functionality to prevent duplicate incidents.

The code implementation itself looks comprehensive and thorough, with good attention to backward compatibility and error handling. The correlation ID approach to prevent duplicate incidents is especially valuable.

AutoGPT-Agent avatar Nov 03 '25 16:11 AutoGPT-Agent

Thank you for your PR adding AllQuiet alert integration alongside Discord alerts! The implementation looks comprehensive, with correlation IDs for system alerts and maintenance of backward compatibility with Discord alerts.

However, before we can merge this PR, please complete the checklist in the PR description:

  1. Check the boxes for items you've completed
  2. Add a test plan describing how you've tested these changes
  3. For the configuration changes (adding AllQuiet webhook URL), please check the configuration checklist items and ensure .env.default is updated appropriately

Once you've addressed these items, we'll be happy to review your PR again for merging.

AutoGPT-Agent avatar Nov 26 '25 18:11 AutoGPT-Agent

PR Reviewer Guide πŸ”

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 πŸ”΅πŸ”΅πŸ”΅βšͺβšͺ
πŸ§ͺΒ PR contains tests
πŸ”’Β No security concerns identified
⚑ Recommended focus areas for review

Title Extraction

The AllQuiet alert title is derived by stripping a limited set of emoji/formatting tokens from the first line; remaining markdown or other emojis may leak into the title or exceed expectations. Consider a more robust sanitizer (e.g., regex to strip all markdown/emojis) and ensure consistent truncation/encoding.

lines = content.split("\n")
title = lines[0] if lines else content[:100]
# Remove Discord formatting from title
title = (
    title.replace("**", "")
    .replace("🚨", "")
    .replace("⚠️", "")
    .replace("❌", "")
    .replace("βœ…", "")
    .replace("πŸ“Š", "")
    .strip()
)

alert = AllQuietAlert(
    severity=severity,
    status=status,
    title=title[:100],  # Limit title length
    description=content,
Missing Timeout/Retry

Posting to the AllQuiet webhook lacks explicit timeout/retry/backoff and error handling beyond a missing-URL check. A slow or failing endpoint could hang or flood logs. Consider adding timeouts, limited retries, and structured error handling/metrics for failures.

async def send_allquiet_alert(alert: AllQuietAlert):
    hook_url = settings.secrets.allquiet_webhook_url

    if not hook_url:
        logging.warning("AllQuiet webhook URL not configured")
        return

    from backend.util.request import Requests

    await Requests().post(hook_url, json=alert.model_dump())

Correlation ID Stability

Correlation IDs are built from raw context/exception strings and function names; small variations (whitespace, punctuation, dynamic values) may fragment incidents. Normalize inputs (lowercase, strip/slugify, truncate) to improve deduplication.


# Create correlation ID based on context, function name, and error type
correlation_id = f"retry_failure_{context}_{func_name}_{error_type}".replace(
    " ", "_"
).replace(":", "")

if send_rate_limited_discord_alert(
    func_name,
    exception,

qodo-code-review[bot] avatar Nov 26 '25 18:11 qodo-code-review[bot]

Thank you for adding the AllQuiet alert integration alongside Discord alerts. The implementation looks solid with:

  • Good correlation ID implementation for different alert types
  • Proper severity levels (critical/warning/minor)
  • Backward compatibility with existing Discord alerts
  • Extra attributes for better incident management context

However, I can't approve this PR because none of the checklist items are checked off. Please complete the checklist in the PR description, particularly:

  • Confirm you've clearly listed your changes
  • Indicate you have a test plan
  • Fill out the test plan section with relevant test cases for this integration
  • Complete the configuration section since you're adding a new environment variable (allquiet_webhook_url)

Once you've checked off the appropriate items and included your test plan, we can proceed with the review. The code changes themselves look good!

AutoGPT-Agent avatar Nov 26 '25 18:11 AutoGPT-Agent

@claude Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid.

Location: autogpt_platform/backend/backend/monitoring/late_execution_monitor.py#L105

Potential issue: The correlation_id for late execution monitoring is dynamically generated using num_total_late and num_users. These variables are derived from the current check run and change with each execution, even for the same underlying condition. This prevents incident management systems, such as AllQuiet, from effectively deduplicating alerts, leading to alert spam where a single persistent issue generates multiple distinct incidents.

ntindle avatar Nov 26 '25 19:11 ntindle

Thank you for this PR to add AllQuiet alert integration! The implementation looks good with consistent handling of both Discord and AllQuiet alerts, and I appreciate the addition of correlation IDs to prevent duplicate incidents.

However, before this PR can be merged, please complete the test plan checklist in the PR description. The test plan currently has only one item checked off and appears incomplete (there's a blank second bullet point with no text).

Please:

  1. Complete your test plan with all the tests you've conducted
  2. Make sure all checkboxes in the "For code changes" section are properly checked

The implementation itself looks solid with good abstractions for the alert system and proper handling of correlation IDs. Once you complete the checklist, this PR should be ready for approval.

AutoGPT-Agent avatar Nov 26 '25 19:11 AutoGPT-Agent

Thank you for this PR adding AllQuiet alert integration alongside Discord alerts. The implementation looks comprehensive with correlation IDs and severity levels to improve incident management.

Before this PR can be merged, please complete the test plan checklist in your PR description. You've outlined a good test plan, but the checkboxes are currently unchecked, suggesting the testing hasn't been completed yet:

- [ ] Send a test alert by triggering the code via an admin page (code to trigger not in pr) and confirm it creates an all quiet alert
- [ ]  

Please check these boxes once you've verified your changes work as expected. Also, the second test item is empty - please either complete it with a meaningful test or remove it.

Otherwise, the PR looks good - the changes match the scope in the title, the implementation is thorough, and you've included the necessary configuration changes in .env.default.

AutoGPT-Agent avatar Nov 26 '25 19:11 AutoGPT-Agent

Thank you for your well-structured PR! The AllQuiet integration looks good, and I appreciate the backward compatibility with Discord alerts.

A couple of observations:

  1. The test plan items are not checked off, indicating you haven't completed testing yet. Please complete the testing according to your plan and check off those items before this PR can be merged.

  2. The PR mentions adding a new secret configuration variable to the infra repo. Make sure this has been coordinated with the team managing the infrastructure.

Overall, this is a well-structured change that will enhance the platform's alert capabilities. Once you've completed the testing, this should be ready for merging.

AutoGPT-Agent avatar Nov 26 '25 19:11 AutoGPT-Agent

Thank you for this PR implementing AllQuiet alerts alongside Discord alerts. The implementation looks solid with good correlation ID management to prevent duplicate incidents and appropriate severity levels.

However, there are a few issues that need to be addressed before this can be merged:

  1. Incomplete test plan: Your checklist indicates that you haven't completed testing of your changes. Please complete the test plan items and check them off.

  2. Missing detailed changes section: While you have a good summary in the PR description, the "Changes" section is empty. Please fill this out with a concise list of the changes made in this PR.

  3. Configuration documentation: You've correctly marked the configuration checklist items, but it would be helpful to explicitly list the new allquiet_webhook_url configuration variable in the "Changes" section.

Once you've addressed these items, particularly the testing, this PR will be ready for review again. The code changes themselves look well-structured and appropriate for the task at hand.

AutoGPT-Agent avatar Nov 26 '25 19:11 AutoGPT-Agent