please icon indicating copy to clipboard operation
please copied to clipboard

github_repo will fail confusingly if the given revision downloads a file with a different prefix.

Open cemeceme opened this issue 1 year ago • 3 comments

If a github_repo rule is used for a repo and the revision argument is set to something that doesnt exist, please will still download whatever the main branch is and try to run arcat on it with the -s {expected but different prefix} flag set.

However, this makes arcat fail silently and some subsequent rule will then fail stating the directory that should have been given by arcat was not found as it wasnt created in the first place.

Ideally either arcat should report that the prefix didnt exist, or there should be a check to see if downloaded file name doesnt match what was expected.

cemeceme avatar Dec 19 '23 11:12 cemeceme

I wasn't able to reproduce this with the following MWE:

github_repo(
    name = "repo",
    repo = "thought-machine/please",
    revision = "0123456789abcdef0123456789abcdef01234567",
)

because GitHub (correctly) returns a 404 instead of an archive:

$ plz-out/bin/src/please build //tests:repo
Build stopped after 1.38s. 1 target failed:
    //tests:_repo#download
1 error occurred:
        * Error retrieving https://github.com/thought-machine/please/archive/0123456789abcdef0123456789abcdef01234567.zi
p: 404 Not Found

The same happens when revision is set to something that doesn't look like a commit hash (e.g. a non-existent branch name).

The thing about arcat not erroring if the value of -s isn't present at the front of any of the archive member names is indeed a known problem, though - I ran into it in #2829.

chrisnovakovic avatar Dec 27 '23 00:12 chrisnovakovic

Ok, so I looked into it a bit more. The issue I was facing was with mysql-connector-cpp. I had incorrectly set please to use the master branch which does not exist (or no longer exists?) as they use trunk instead. So it was trying to fetch https://github.com/mysql/mysql-connector-cpp/archive/master.zip, which appears to fetch the correct trunk branch from github instead, everything named with the trunk prefix. For the record, the 'actual' url is supposed to be https://github.com/mysql/mysql-connector-cpp/archive/trunk.zip as expected, which works fine in please. I also checked some other common branch names, such as main, which shows the correct 404 response as you said.

I would still say that please should confirm if the downloaded file actually matches, but this seems to be more related with some (intentional?) inconsistencies with github.

cemeceme avatar Jan 03 '24 22:01 cemeceme

Right - I suspect what happened here is that the owner(s) of mysql-connector-cpp renamed the default branch, which - as you suggested - will set up a redirection of requests for the original branch name to the new branch name (although this happens for all renamed branches, not just the default branch).

The downloading of the archive in github_repo is ultimately handed off to remote_file, so this is really outside the scope of the subrepo rules. Perhaps it makes sense to add a new follow_redirects parameter to remote_file, defaulting to True for backwards compatibility but allowing for problems like this to be surfaced in a more obvious way by setting it to False. I think this is a better solution than trying to hack in GitHub-specific workarounds in github_repo (especially because any hashes specified on the github_repo target will no longer be valid for the downloaded archive after the branch is renamed, which IMO dooms any attempt to handle the branch rename automatically and transparently). I'll have a chat with the Please folks tomorrow and see what they think.

chrisnovakovic avatar Jan 03 '24 23:01 chrisnovakovic