registry-dev icon indicating copy to clipboard operation
registry-dev copied to clipboard

Don't fail publishing when registry repos are behind the upstream

Open thomashoneyman opened this issue 2 years ago • 3 comments

In https://github.com/purescript/registry/issues/336#issuecomment-1660974501 a package failed to commit and push metadata because the local checkout of the registry repo was behind the upstream by one commit. I don't think this should happen — we should be able to pull to re-sync and then try to push again.

The error in this case arose from the implementation of WriteMetadata:

https://github.com/purescript/registry-dev/blob/8177cd30d7b7cf3379b8b728564def0ed438298c/app/src/App/Effect/Registry.purs#L250-L271

...but it applies everywhere we use writeCommitPush, which includes writing metadata, manifests, package sets, and other registry data to the various registry repositories. There's a narrow window to get behind the upstream — you'd have to get behind in the time between the pull but before the push in the implementation linked below — but it's certainly possible:

https://github.com/purescript/registry-dev/blob/8177cd30d7b7cf3379b8b728564def0ed438298c/app/src/App/Effect/Registry.purs#L678-L693

To drill down even further, the specific place where this error arises is in the call to Git.gitCommit. If we are behind the upstream then we bail out of the commit.

https://github.com/purescript/registry-dev/blob/8177cd30d7b7cf3379b8b728564def0ed438298c/app/src/App/CLI/Git.purs#L181-L200

It's OK that we bail out, but we shouldn't bail out of the entire pipeline. Instead, we could bail out of the commit, pull again (with --autostash), and then try to commit again. Only on some n number of failures to resync with the upstream would it be reasonable to fully terminate the process.

thomashoneyman avatar Aug 01 '23 20:08 thomashoneyman

Is this really a problem? I don't think we gain anything in allowing multiple jobs to run concurrently, and we should just limit the amount of running jobs to a single one

f-f avatar Aug 01 '23 21:08 f-f

It feels odd to arbitrarily limit jobs such that if someone else is uploading a package then I cannot publish mine. Isn’t it preferable to allow concurrent package uploads?

Certainly concurrent jobs are possible so long as the GitHub pipelines exist. We will have to rewrite the GitHub module to send requests to the server instead of executing anything in the action itself if we want control. (I know eventually we want to disable the GitHub API altogether.)

If we don’t want any concurrent operations (anything that touches a shared git resource, at least) then that would mean that the build matrix, daily importer, etc cannot run if a package is currently being published and vice versa. Isn’t that problematic?

thomashoneyman avatar Aug 01 '23 22:08 thomashoneyman

Here's another example of this happening:

https://github.com/purescript/registry/issues/344 was terminated because https://github.com/purescript/registry/issues/345 modified the metadata during its run.

thomashoneyman avatar Aug 04 '23 13:08 thomashoneyman