community
community copied to clipboard
duplicate hook numbers sending concurrent webhooks
Description
There is a known issue that spamming webhooks can produce an error with duplicate build numbers for the server:
duplicate key value violates unique constraint "builds_repo_id_number_key"
To minimize the impact of this issue, we introduced some retry logic in the webhook workflow:
https://github.com/go-vela/community/issues/213#issuecomment-781655122
NOTE: The retry logic did not remove the behavior completely but it did lessen the occurrences of it.
However, recently we started noticing a slightly different flavor of this error specific to hooks
:
{"error":"unable to create webhook OrgName/RepoName/BuildNumber: ERROR: duplicate key value violates unique constraint \"hooks_repo_id_number_key\" (SQLSTATE 23505)"}
We believe this is the same kind of underlying problem where concurrent webhooks are being processed and attempted to be inserted into the database for a repository with the same number. After some exploration, we didn't see any evidence that suggests we have the same retry logic in place for the hooks
table like we setup for the builds
table:
https://github.com/go-vela/server/blob/main/api/webhook/post.go#L261-L271
We should add additional logic to account for this problem for the hooks
table like we did for the builds
table.
Value
- users following the standard webhook workflow no longer face issues with Vela not triggering builds
Useful Information
- What is the output of
vela --version
?
v0.23.4
- What operating system is being used?
https://www.flatcar.org/
- Any other important details?
The reports we've received thus far have been small so we're still attempting to gather additional data and information. These reports seem to be specific to "~10% of the time we'll notice that when we merge a pull request the push does not get processed by Vela". After some exploration, we've identified that those repositories have automatic branch deletion configured:
https://docs.github.com/en/[email protected]/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-the-automatic-deletion-of-branches
On the GitHub side, we can confirm that this behavior causes two push
events being fired at the same exact time:
- a
push
to the targetbranch
for the PR - a
push
for the deletion event of the sourcebranch
from the PR
Also, these reports are from a more recent time window of "about the last month or so". Unfortunately, we're not able to completely verify exactly when this happened because GitHub doesn't store webhook deliveries that far back.