500 errors in Packit
Reported by @majamassarini in https://github.com/fedora-copr/copr/pull/3329#issuecomment-2288099204
I was thinking that probably we should open an issue for this. I hoped it was somehow related to the above fix but sadly it isn't. I looked in our logs and what catches me is that we got 500 errors from COPR in the last 7 days just in a bunch of projects and the exceptions are scattered all along the period of time; so I would say it does not depend on an high volume of requests or high load in the COPR service.
2024-08-13T13:42:03 https://github.com/containers/podman/pull/23601 2024-08-10T09:39:55 https://github.com/containers/podman/pull/23569 2024-08-12T13:29:25 https://github.com/containers/podman/pull/23581 2024-08-12T15:38:41 https://github.com/containers/podman/pull/23587 2024-08-07T14:38:59 https://github.com/containers/podman/pull/23537
2024-08-13T16:26:09 https://github.com/containers/common/pull/2124 2024-08-09T23:14:02 https://github.com/containers/common/pull/2119
2024-08-08T00:58:49 https://github.com/containers/crun/pull/1513 2024-08-12T17:50:45 https://github.com/containers/crun/pull/1519 2024-08-12T21:16:54 https://github.com/containers/crun/pull/1520
2024-08-10T00:08:19 https://github.com/containers/buildah/pull/5680 2024-08-12T19:43:26 https://github.com/containers/buildah/pull/5681 2024-08-12T20:15:48 https://github.com/containers/buildah/pull/5682
2024-08-11T14:02:53 https://github.com/containers/netavark/pull/1052
2024-08-13T10:50:34 https://github.com/rpm-software-management/dnf5/pull/1625
2024-08-08T11:29:19 https://github.com/cockpit-project/cockpit-machines/pull/1760 2024-08-11T19:30:25 https://github.com/cockpit-project/cockpit-machines/pull/1761 2024-08-12T03:47:41 https://github.com/cockpit-project/cockpit-machines/pull/1762
The containers projects and the cockpit-machines project both use the packages key. With the packages key I would expect more requests from Packit to COPR in a short period of time in comparison to other Packit projects. I could be wrong, but to me it looks like a race condition on the COPR side. Also because this does not happen always on the same PR, thus, probably, it is not the data we submit to COPR.
The dnf5 project, instead, has the most simple packit config we could find and nevertheless has been hit by this problem. I can explain it again just with some kind of race condition...
I can't spot anything else interesting in our logs but let us know if we can help you in some way debugging it.
Last 6 hours nothing suspicious. One of the events mentioned above created this traceback: log.txt
[Tue Aug 13 10:50:32.041388 2024] [wsgi:error] [pid 3866555:tid 3866782] [remote 107.20.230.14:21570] psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "copr_name_for_user_uniq"
[Tue Aug 13 10:50:32.041400 2024] [wsgi:error] [pid 3866555:tid 3866782] [remote 107.20.230.14:21570] DETAIL: Key (user_id, name)=(5576, rpm-software-management-dnf5-1625) already exists.
Trying to open https://download.copr.fedorainfracloud.org/results/mcrha or https://download.copr.fedorainfracloud.org/results/rpmsoftwaremanagement/ leads to:
504 ERROR
The request could not be satisfied.
CloudFront attempted to establish a connection with the origin, but either the attempt failed or the origin closed the connection. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.
Generated by cloudfront (CloudFront)
Request ID: qb8C5Vsx7ySyHQaTIpUS5n-x2e_Q4qHumkAWoTo2f1ApQVW_IWDKig==
Is this anyhow related to this issue or I may file a new one, please?
@mcrha thank you for reporting that! Wes we had copr-backend issues yesterday, sorry for the inconvenience (should be working OK now). The problem discussed here is in copr-frontend.rpm (different VM).
Aha, I see, different thing then. I'm sorry for the noise. You are right, it cured on itself an hour or so after I wrote a note here.
@FrostyX , @praiskup I was quickly checking the last occurrences of this exception on the Packit side and I saw that this happened last time on August the 22nd around 10AM. I don't know if you have done something that could have solved the problem? Or maybe the projects that trigger this exception are just on vacation ^_^. I don't think something has changed on the Packit side on Thursday the 22nd (we release packit service on Tuesday).
If it can be of any help I checked the Packit logs again, here the latest exception we collected:
2024-09-16T06:00:25.410799561+00:00 https://github.com/rpm-software-management/mock/pull/1452 2024-09-13T11:58:14.606678447+00:00 https://github.com/rpm-software-management/dnf5/pull/1696 2024-09-15T16:55:24.320991349+00:00 https://github.com/containers/podman/pull/23958 2024-09-16T19:05:21.306196435+00:00 https://github.com/containers/podman/pull/23970 2024-09-13T17:29:00.908865675+00:00 https://github.com/rpm-software-management/dnf5/pull/1699 2024-09-14T03:55:13.329540281+00:00 https://github.com/rpm-software-management/mock/pull/1451 2024-09-18T14:24:55.040294306+00:00 https://github.com/containers/podman/pull/23999 2024-09-12T15:37:14.705051873+00:00 https://github.com/containers/buildah/pull/5734 2024-09-12T16:52:49.475155359+00:00 https://github.com/containers/conmon/pull/528 2024-09-17T11:40:13.308385515+00:00 https://github.com/containers/podman/pull/23979 2024-09-17T04:03:46.018696225+00:00 https://gitlab.com/packit-service/hello-world/-/merge_requests/1127 2024-09-16T14:41:02.828280764+00:00 https://github.com/containers/container-selinux/pull/329 2024-09-17T12:15:29.408973412+00:00 https://github.com/containers/container-selinux/pull/330
Hello, has the fix been deployed to copr? This issue did occur 8 hours ago: https://github.com/containers/ramalama/pull/185#issuecomment-2372560227
Not yet, the ETA plan for the release is next Thursday (if everything goes OK). Is that OK, or do you want us to hot-fix this in production?
@praiskup this morning Packit logged a new exception (for the ramalama project):
Cannot create a new Copr project (owner=packit project=containers-ramalama-182 chroots=['fedora-41-x86_64', 'fedora-rawhide-x86_64', 'fedora-40-x86_64', 'fedora-39-x86_64']): Copr: 'packit/containers-ramalama-182' already exists. Copr HTTP response is 400 BAD REQUEST.
I thought we should handle it silently. Or there is something more to be deployed? And we shouldn't expect this to happen anymore?
Not yet, the ETA plan for the release is next Thursday (if everything goes OK). Is that OK, or do you want us to hot-fix this in production?
@praiskup just to double check, next Thursday is tomorrow or Thursday of next week?
Would be great if you could hotfix this. Else I'll just ask people to wait some more.
Oh, yes - I meant "next week Thursday" rather than "this week Thursday". But I can try to hotfix tomorrow, seems pretty easy to rollback if problems appear at least.
Oh, yes - I meant "next week Thursday" rather than "this week Thursday". But I can try to hotfix tomorrow, seems pretty easy to rollback if problems appear at least.
great. Thanks @praiskup
I applied the patch now, I am sorry it took several days... got quite busy elsewhere.
I applied the patch now, I am sorry it took several days... got quite busy elsewhere.
Thanks @praiskup . I'll watch out for further occurrences if any.
@praiskup still seeing it unfortunately https://github.com/containers/buildah/pull/5765#issuecomment-2388590070
Thank you for the report. I'm locked in a meeting room, but this seems like a different issue, not sure if related: #3443.
@praiskup still seeing it unfortunately containers/buildah#5765 (comment)
looks like they started running without anyone from the team restarting them. So maybe it works but some tmp failure messages need to be silenced?
@lsm5 I'm unsure how/when Packit re-creates the projects; perhaps some people from Packit reacted. Anyway, the remaining typo triggering 500 was fixed in #3443. I haven't seen this problem since Thursday's service upgrade (scheduled outage).
@lsm5 I'm unsure how/when Packit re-creates the projects; perhaps some people from Packit reacted. Anyway, the remaining typo triggering 500 was fixed in #3443. I haven't seen this problem since Thursday's service upgrade (scheduled outage).
ack thanks @praiskup . I'll watch out for further occurrences.