please
please copied to clipboard
fix: safely add follow redirects feature to remote_file
disabled by default, only active for github_subrepo
I think we need this to fix #2249.
I have found another issue with testing. It might not be a blocker for most, but for our usecase using Codespaces it is. The default token loaded in a Codespace environment has read access to other repos in our orgs. However, these permissions do not seem to be enough to use this /archive endpoint. It is however enough to use this API. I've been testing this and having a few issues. Something related to the name/location of the download.
@Tatskaari @tiagovtristao could someone help me debug this? My last commit adds a fix that should allow the download with a less privileged token from GitHub. I also did it in a way that will maintain backward compatibility by trying the existing URL and only if it fails then trying the next URL in the list. There is some issue though when downloading the file from the 2nd URL and it happens on public or private repos. I made the issue easy to replicate by commenting out the first URL so that when you run bootstrap.sh the tests that use the GitHub sub repo fail. I'll add a comment direct next to this so it's clear. I've tried debugging. The file gets downloaded but not unzipped. I've compared the files that get downloaded from both URLs and they look identical.
I'd be grateful for the help. I think it would be pretty quick. Just check out my branch and run ./bootstrap.sh. It fails downloading third_party/cc/gtest. Once we debug why it is failing we can comment back in both Urls and maintain backward compatibility, where the 2nd URL is only used if the first one fails.
@Tatskaari @tiagovtristao could someone help me debug this? My last commit adds a fix that should allow the download with a less privileged token from GitHub. I also did it in a way that will maintain backward compatibility by trying the existing URL and only if it fails then trying the next URL in the list. There is some issue though when downloading the file from the 2nd URL and it happens on public or private repos. I made the issue easy to replicate by commenting out the first URL so that when you run bootstrap.sh the tests that use the GitHub sub repo fail. I'll add a comment direct next to this so it's clear. I've tried debugging. The file gets downloaded but not unzipped. I've compared the files that get downloaded from both URLs and they look identical.
I'd be grateful for the help. I think it would be pretty quick. Just check out my branch and run ./bootstrap.sh. It fails downloading third_party/cc/gtest. Once we debug why it is failing we can comment back in both Urls and maintain backward compatibility, where the 2nd URL is only used if the first one fails.
There seems to exist code added/commented out that shouldn't make it to the final state of this PR, correct? So trying to narrow it down, is the -L
flag/follow redirects option the only change needed?
is the
-L
flag/follow redirects option the only change needed?
Technically. But if we could make the api.github.com url work I think it's an improvement for private repos because it requires less privileged tokens.
I personally want to see this extra improvement as it helps streamline the entire process rolling it out to the team.
@tiagovtristao, I updated this PR to remove the comment. This makes the PR valid and works as is. It's really hurting my project's momentum without this, so I want to do what I can to make it mergable so I can easily access a prerelease build.
This PR specifies two URLs for download and because the old URL is first, it is backwards compatible. The known issue with this is just that I have to use a PAT with enough access. The Codespaces PAT does not have download access on the first URL and the 2nd URL downloads the ZIP file, but the plz http_archive/remote_repo code seems to choke, due to some difference in how the file is downloaded. To reiterate, that only happens for a private repo and if the PAT is rejected by the first URL. In other words, its avoidable. I'd like to fix that, but it's more important that I make this usable first.
If you don't trigger using curl for downloads e.g. by setting headers, Please will handle the remote files as a special case: https://github.com/thought-machine/please/blob/master/src/build/build_step.go#L1032
Is that why it's not working? Maybe we can make it use curl when following redirects.
I did notice that as well and in my testing was forcing curl by setting the Accept
header. I confirmed that a valid zip downloads, it just oddly seems to get hung up in the jar cat uncompress step. I thought maybe because there was no file extension, but I tried testing with a forced file extension without luck.
Tried this: ./pleasew plz build //test/remote/... Doesn't work.
@afterthought ➜ /workspaces/please (remote-file-follow-redirects ✗) $ ./pleasew plz build //test/remote/... Can't determine version, will use latest. Downloading Please 16.19.0 to ./plz-out/please/16.19.0... Should be good to go now, running plz... ./pleasew: line 97: /workspaces/please/plz-out/please/please: No such file or directory
FYI...
@Tatskaari we can actually merge this as is!!!! I just tested and confirmed that if I supply my own strip_prefix to the github_rule it works. Keep in mind this is ONLY necessary if the first URL fails and if the token doesn't work on the first URL. This was the edge case scenario I wanted to work. It is now compatible with my codespace's default token. Put another way, as is, this change remains completely backward compatible.
For me the workaround looks like this:
github_repo(
name = "myrepo",
repo = "org/myrepo",
revision = "72cc417ffe2aa24c9703377c7f957d0d9ed0d2c9",
strip_prefix = "org-myrepo-72cc417ffe2aa24c9703377c7f957d0d9ed0d2c9",
access_token = "GITHUB_TOKEN",
)
I'd be very appreciative if this could make it into a prerelease build sooner than later. I've been jumping through hoops to do this and I can't share with my team until I get this sorted out.
@Tatskaari @tiagovtristao bump. If you don't think we can merge this anytime soon, maybe you can help me get my own fork/build process setup. I can build it obviously, but there's no clean way to distribute. I'm guessing with a few pointers, I could mimic the official release process in a GitHub Action and upload to a private s3 bucket?
This is the script I came up with. Posting it here to help others...
#!/usr/bin/env bash
set -e
if [ "$(id -u)" -ne 0 ]; then
echo -e 'Script must be run as root. Use sudo, su, or add "USER root" to your Dockerfile before running this script.'
exit 1
fi
export HOME=${HOME:-/home/vscode}
echo "Custom build please until PRs get merged: "
echo "https://github.com/thought-machine/please/pull/2315"
echo "https://github.com/thought-machine/please/pull/2275"
cd $HOME
if [ ! -d $HOME/please_build ] ; then
git clone -b tmpfork https://github.com/afterthought/please please_build
fi
docker build -t plz_build_image:latest $HOME/please_build/tools/images/ubuntu/
export PLZ_ARGS="-p --profile ci"
docker run --rm --volume $HOME/please_build:/tmp/please -w /tmp/please plz_build_image:latest ./bootstrap.sh
mkdir -p $HOME/.please
cp -r $HOME/please_build/plz-out/please/ $HOME/.please
rm -rf $HOME/please_build
docker rmi plz_build_image:latest
docker system prune -f
I run this script from my devcontainer build like this:
# Tmp workaround plz PRs
COPY library-scripts/*.sh /tmp/library-scripts/
RUN /bin/bash /tmp/library-scripts/build_please_pr.sh
# TODO: add back once PRs get merged
# RUN curl https://get.please.build | bash
Hi @afterthought, I'm not sure what's going to get merged eventually, but until then you could use a escape hatch for your use case. You could try to override the build rule yourself and preload it.
build_defs/overrides.build_defs
_original_rule = original_rule
def original_rule()
# Do something here
return _original_rule(
# Change how arguments are handled here
)
.plzconfig
[parse]
preloadbuilddefs = build_defs/overrides.build_defs
Something like that might work.
Thanks @tiagovtristao. I do appreciate this project and your efforts. I have referred to this PR as "my edge case", but I was kind of down playing this just because I think I'm pushing the boundaries a little bit using Codespaces. I just want to put in my plug for this though. This PR is the difference between the private repo support working seamlessly in Codespaces vs. friction and extra security risks for developers slinging their own PATs around.
I almost forgot this myself, without the curl follow feature in this PR, the private GitHub functionality doesn't work at all. I'm now in the middle of figuring out how to best get my custom build integrated with https://github.com/sagikazarmark/setup-please-action.
Something like that might work.
Not sure if you're open to this, but we use this semantic-release library extensively. I always configure prerelease branches using dev/* and feature/* wildcard. This gives me automatic prerelease builds for each branch.
bump...
bump...
I still don't think this PR works unless I'm missing something. The api.github.com
version of the request downloads an archive in a different format so you can't just whack the second URL in the remote file rule and expect it to unzip correctly. Looks like this format depends on the actual commit hash rather than the revision you provide so isn't predictable from the arguments we have available.
The curl commands here are getting a bit out of hand in my opinion. I've implemented the follow-redirects functionality by implementing headers etc. in the built in fetchRemoteFile()
function. This uses the golang http client which follows redirects by default. You can find that here #2437
Maybe we can write an alternative rule for fetching from the https://api.github.com
that can find the repo root in the downloaded archive and host that on Pleasings?
@Tatskaari I am using this as we speak and it does work. You are right that the hash isn't predictable and I have to supply a strip prefix to correctly handle the download. This isn't ideal, but it's allowing me to function. I use the revision from my other private repo and paste it both into my revision and strip_prefix, like so. If you format the strip prefix as org dash repo dash commit hash, that is the predictable value.
github_repo(
name = "myrepo",
repo = "org/myrepo",
revision = "72cc417ffe2aa24c9703377c7f957d0d9ed0d2c9",
strip_prefix = "org-myrepo-72cc417ffe2aa24c9703377c7f957d0d9ed0d2c9",
access_token = "GITHUB_TOKEN",
)
The github_repo rule is important to us and we have been building a library of private build defs. These obviously get used in CI but also for developer's inner loop work. To make the inner loop work seamless on Codespaces it's important that we be able to download from this other URL. I can appreciate the problems you are pointing out. You guys know best how to guide the project. As a "user", it would be nice to have a "just works" experience with the github rule. With the first URL and a normal PAT, it "just works" as-is. A Codespace user will stumble and get tripped up here since the PAT provided by Codespaces does not "just work" in my initial testing. This is/was my primary concern. I can agree that a separate rule specifically for this api.github.com
url would be preferable if the "strip_prefix" logic was hidden away. Something like this:
github_repo2(
name = "myrepo",
repo = "org/myrepo",
commit_hash = "72cc417ffe2aa24c9703377c7f957d0d9ed0d2c9",
access_token = "GITHUB_TOKEN",
)
I'm a bit at a loss though around naming and documentation and how to explain to users why a second version exists. Also how do you funnel Codespace users that get "stuck" to find the workaround solution?
I think if the team maintaining please will not merge this, it would be nice to officially close so that OUR team can react to the bad news. This just leaves us in limbo using a custom fork of please. If this change does not get merged, we can at least decide how long to maintain our fork vs. deal with the undesirable side-effects of not including this change. To repeat the side effect, it means using Codespaces in a less secure manner.
No disagreement here that there is a preferable implementation to the curl use. But this PR was just using the existing rules to add support for private github repos. It accomplishes the goal, with the minor downside of needing to be specific with a strip_prefix IF and ONLY IF the token provided doesn't first work with the original download URL. For most users, it will "just work". It's just my edge case of using Codespaces with a token that does not work with the first URL but does work with the API url, when the strip_prefix workaround becomes required.
@Tatskaari, I spent time reviewing the changes that remove curl
and clearly this PR makes no sense any more.
With the underlying improvements to remote file, it's much easier for me to create a forked build_def in my own repo for GitHub (see below). It would be nice to figure out how to incorporate this into the main project, as I assume I won't be the last person to attempt to use a private please subrepo from Codespaces (and I'll have to duplicate this rule or host it somewhere in a public repo). The only difference in this rule is the url and the prefix. I have only tested with sha revisions, not branch/tag names. I'm not sure if/how that would be supported.
def github_api_repo(name:str, repo:str, revision:str, build_file:str=None, labels:list=[],
hashes:str|list=None, strip_prefix:str=None, strip_build:bool=False, config:str=None,
access_token:str=None, bazel_compat:bool=False):
"""Defines a new subrepo corresponding to a Github project.
This is a convenience alias to the more general new_http_archive. It knows how to
construct Github URLs appropriately which allows a more concise description.
Args:
name: Name of the rule.
repo: Github repo to fetch from (e.g. "thought-machine/please").
revision: Revision to download. This can be either a release version, commit or branch.
build_file: The file to use as a BUILD file for this subrepository.
labels: Labels to apply to this rule.
hashes: List of hashes to verify the rule with.
strip_prefix: Prefix to strip from the contents of the zip. Is usually autodetected for you.
strip_build: True to strip any BUILD files from the archive after download.
access_token: An environment variable containing a github personal access token. This can be used to access
private repos.
config: Configuration file to apply to this subrepo.
bazel_compat: Shorthand to turn on Bazel compatibility. This is equivalent to
specifying a config file with `compatibility = true` in the `[bazel]`
section.
"""
org, _, repo = repo.partition('/')
assert repo, "Must pass a valid Github repo argument, e.g. thought-machine/please"
# If it looks like semver, we need to remove the v from it because github formats their zips differently for semver
# tags
if is_semver(revision):
prefix = revision.removeprefix('v')
else:
prefix = revision
prefix = f'{org}-{repo}-{prefix}'
headers = {}
pass_env = []
if access_token:
headers = {
'Authorization': 'token $%s' % access_token,
}
pass_env = [access_token]
return new_http_archive(
name = name,
urls = [f'https://api.github.com/repos/{org}/{repo}/zipball/{revision}'],
strip_prefix = strip_prefix or prefix,
strip_build = strip_build,
build_file = build_file,
labels = labels,
hashes = hashes,
config = config,
headers = headers,
bazel_compat = bazel_compat,
pass_env = pass_env,
)
Unfortunately it seems I'm stuck with this again as I intermittently have a build dead lock, similar to what is described here: https://github.com/thought-machine/please/issues/1378. I've tried using preloadbuilddefs and still see the dead locking.
preloadbuilddefs = ./.build_defs/subrepos/subrepos.build_defs
It would be nice to figure out how to incorporate this into the main project
We're actually trying to slim down the build definitions available out of the box and make Please more modular. I think this rule should just be hosted on github. Right not the pleasings repo might be a good place for it. We could create a subrepos
plugin to host all the rules related to pulling down subrepos from vcs. See the java plugin for an example.
Unfortunately it seems I'm stuck with this again as I intermittently have a build dead lock, similar to what is described here: #1378. I've tried using preloadbuilddefs and still see the dead locking.
preloadbuilddefs = ./.build_defs/subrepos/subrepos.build_defs
If you switched and you're still seeing deadlocking, then I'm thinking the problem is likely elsewhere. The pre-loaded build definitions work through an entirely different mechanism which is unlikely to cause a deadlock. Have you made any other changes?
This issue has been automatically marked as stale because it has not had any recent activity in the past 90 days. It will be closed if no further activity occurs. If you require additional support, please reply to this message. Thank you for your contributions.