docs: add RFC for service webhook auto-repair mechanism
What this PR does / Why we need it:
This PR proposes an auto-repair mechanism for service webhooks that have been manually deleted from code repositories (GitLab/GitHub/Gerrit/Gitee).
Current Problem:
- When webhooks are manually deleted from repositories (e.g., cleanup, migration, accidental deletion), the system cannot automatically recover them
- Code changes can no longer trigger workflows after webhook deletion
- Users must manually delete and recreate services to restore webhook functionality
Solution: During service updates, the system will automatically:
- Verify webhook existence using the stored HookID from the webhook collection
- Detect if the webhook has been deleted from the repository
- Automatically recreate the webhook if missing
This eliminates manual intervention and ensures continuous webhook availability.
What is changed and how it works?
Design Document: community/rfc/2025-01-10-service-webhook-auto-repair.md
Key Changes:
- Add webhook verification interface for all Git platform clients: - GitLabClient.WebHookExists(owner, repo, hookID) - GitHubClient.WebHookExists(owner, repo, hookID) - GerritClient.WebHookExists(repo, remoteName, hookID) - GiteeClient.WebHookExists(owner, repo, hookID) (public/private)
- Modify ProcessServiceWebhook to add verification logic: - Only verify during service updates (not during creation) - Execute asynchronously using goroutine to avoid blocking - Verification failures do not affect the main workflow
- Implement verification and repair functions: - VerifyAndRepairWebhook(): Query webhook record, verify existence, recreate if missing - checkWebhookExists(): Platform-specific webhook verification
Workflow:
Service Update
↓
ProcessServiceWebhook
├─ ProcessWebhook (existing: compare DB records, add/remove webhooks)
└─ [NEW] Verify and repair webhook (async)
├─ Query webhook record from DB (get HookID)
├─ Call Git platform API to verify webhook existence
└─ If not found: delete old record → recreate webhook
Multi-platform Support:
- GitHub (github)
- GitLab (gitlab)
- Gerrit (gerrit)
- Gitee Public Cloud (gitee)
- Gitee Private Deployment (gitee-enterprise)
Key Constraints:
- Non-invasive: failures do not block service updates
- Async execution: uses goroutine, no performance impact
- Backward compatible: leverages existing webhook collection and HookID field
Does this PR introduce a user-facing change?
User-Facing Impact:
-
Improved availability: Webhooks are automatically repaired without manual intervention
-
No breaking changes: Completely backward compatible, no API or database modifications required
-
Transparent to users: Auto-repair happens in the background during service updates
-
Fixes existing issue: Resolves the problem where manually deleted webhooks cannot be recovered
-
[ ] API change
-
[ ] database schema change
-
[ ] upgrade assistant change
-
[ ] change in non-functional attributes such as efficiency or availability
-
[x] fix of a previous issue