copr icon indicating copy to clipboard operation
copr copied to clipboard

f39: Giving up waiting for copr_base repository

Open stsp opened this issue 2 years ago • 14 comments

My project fails the builds for f39, whereas all other builds are fine. Here's the log: https://download.copr.fedorainfracloud.org/results/stsp/dosemu2/fedora-39-x86_64/06679713-dosemu2/backend.log.gz Its always the same. What does this mean?

stsp avatar Nov 22 '23 16:11 stsp

Hi, what I can see from the project the F39 never worked in your repo. That's odd... seems like the repo for F39 wasn't generated properly. (maybe some short outage during that time, I don't know...). Can you please try to regenerate repos for your project as suggested in the logs?

Or alternatively there is button in GUI - Regenerate Repositories at the project homepage. Try to use it, hopefully it will help Screenshot_20231122_184002

nikromen avatar Nov 22 '23 17:11 nikromen

Yes, that seems to work now, thanks. I've noticed that the packages that were marked as successful, have status "forked" for f39. So you might be right it never worked at all, even if some builds were not failed.

stsp avatar Nov 22 '23 19:11 stsp

Or alternatively there is button in GUI - Regenerate Repositories at the project homepage.

This is a frequent issue, I've given this advice dozens of times. @praiskup, @nikromen, do we think it is fixable? Maybe by retrying the createrepo run when it fails, or something like this? If possible, we should maybe prioritize a proper fix for this.

Otherwise, I'd at least write an explanation to our FAQ but at this moment, I don't even know an explanation why exactly this happens.

FrostyX avatar Nov 26 '23 13:11 FrostyX

Well, I haven't seen it for a long time actually.

I mean, there's the residual issue #2272 that we know of, but it is a different issue from this one. Can we find the reasons why createrepo was not run in this #3016 case? Action logs?

FAQ but at this moment I don't even know an explanation why exactly this happens.

+1 we could actually admit that we don't know, and ask for help if users have any useful observations

Maybe by retrying the createrepo run when it fails

Or the action is not even run?

Note that PULP is going to solve this very likely. In the meantime, @stsp since this is happenning for you for the second time - don't you see some pattern in the issue? Isn't this also related to the Rawhide -> 39 branching?

praiskup avatar Nov 26 '23 15:11 praiskup

Note there's an error in the backend.log:

[2023-11-22 13:25:54,679][ ERROR][PID:1764229] Backend process error: Giving up waiting for copr_base repository, please try to manually regenerate the DNF repository (e.g. by 'copr-cli regenerate-repos <project_name>')
[2023-11-22 13:25:54,679][WARNING][PID:1764229] Switching not-finished job state to 'failed'

which become hidden by subsequent log entries over the time.

praiskup avatar Nov 26 '23 15:11 praiskup

@stsp since this is happenning for you for the second time

No, just for the first time. But for x86 and aarch64 builds of f39. I don't have other f39 builds, maybe they'd also fail.

Isn't this also related to the Rawhide -> 39 branching?

I am pretty sure it happened on branching, yes. Now I didn't even have to rebuild the packages that were marked as "branched" - I guess those were inherited from rawhide.

stsp avatar Nov 26 '23 15:11 stsp

  • Action item: We should check our branch Fedora script if the createrepo_c failure, fails the action or not (i.e. are we able to find the broken repos by querying failed actions?)
  • Action item: Is it possible to repeat the action/failed createrepo_c command?

FrostyX avatar Nov 29 '23 13:11 FrostyX

Has this actually happened for the F40 branching?

praiskup avatar Feb 21 '24 13:02 praiskup

For me: no.

stsp avatar Feb 21 '24 14:02 stsp

Thank you for the feedback. Reviewing the action items above... I think they are still worth doing, we had ~300 failed actions during the last branching event:

Screenshot_20240222_103233

Screenshot_20240222_103407

praiskup avatar Feb 22 '24 09:02 praiskup

$ psql -c 'select count(id) from action where action_type = 6 and created_on > 1706751687 and result != 1;'
Line style is unicode.
Border style is 2.
┌───────┐
│ count │
├───────┤
│   296 │
└───────┘
(1 row)

praiskup avatar Feb 22 '24 09:02 praiskup

Most of them are in the @fedora-review namespace (only 91 not):

[copr-fe@copr-fe ~][PROD]$ psql -c 'select count(id) from action where action_type = 6 and created_on > 1706751687 and result != 1 and data not like '''%fedora-review%''';' Line style is unicode. Border style is 2. ┌───────┐ │ count │ ├───────┤ │ 91 │ └───────┘ (1 row)

praiskup avatar Feb 22 '24 09:02 praiskup

I backed up the corresponding logs on backend:

$ ls -1 /var/lib/copr/public_html/archive/issues/copr-3016
action_dispatcher.log-20240218.gz
actions.log-20240218.gz

praiskup avatar Feb 22 '24 09:02 praiskup

I checked two examples of non-@fedora-review builds, but they still use fedora-review:

https://copr.fedorainfracloud.org/coprs/zzambers/fedora-pkgs/build/6240960/ https://copr.fedorainfracloud.org/coprs/fed500/md4c/build/6222513/

praiskup avatar Feb 22 '24 09:02 praiskup

This has been implemented as createrepo_c retry mechanism in #3225. We believe it fixed this issue.

praiskup avatar Apr 29 '24 11:04 praiskup