psc-package
psc-package copied to clipboard
Allow revision
This is to further the discussion we were having in #33
I merged in upstream master so this should merge cleanly. You were asking if its possible for git ls-remote
could be used to clean this up. I think that's for phase 2 of what I described in the original issue. The way I see it, this gets psc-package operating as-documented. Right now it says you can specify a tag or SHA. In reality, it lets you specify a branch or tag but not a SHA. It seems like supporting a branch may be a misfeature because it is not a fixed single commit but rather a moving target, yet psc-package basically assumes its a fixed target leading to some confusion.
The correct final solution here is to forbid branches, which involves some sort of git incantation to tell us what type a tree-ish is and giving a nice error message if its a branch explaining that psc-package targets must be immutable. I also suspect that once we do that, we can bring back the "exists" checks to do fewer downloads. Its totally ok if you don't want to do this half-measure, it was just the simplest solution I knew how to implement right now and solved a problem I was having.
Sorry for the delay, it's been a busy week.
This seems like it could decrease performance for repos with many tags. Is there no simpler way to pull a single SHA hash from a remote repo?
Sorry for the delay. That's a valid concern if I understand it correctly. I think we could avoid ever redundantly fetching if we could treat the reference as immutable, which SHAs and tags should be. So I see a few paths forward:
- Document that this only supports SHA and tag. Behavior of branches is undefined. Then we keep the fetch mechanism but if the destination exists, do not fetch. That way you pay the requests once and when you add a new dep or change the SHA/tag.
- Do 1 but also somehow try to protect the user from themselves by identifying that the token they gave is not a SHA or tag but is a branch and throw an error. I do not know how to do this with git or if its even possible. For instance, is it possible for a tag to have the same name as a branch?
@MichaelXavier Why wouldn't git ls-remote
work? It displays data in this format:
12d92467ae6ecb13957bb1ab5ea07670bd985c35 refs/heads/branch1
cb1979308583e8420cead50e6b524d90353e0a84 refs/heads/branch2
17d415ab1e30fb8b9002de4bac4d9e543ebca7e9 refs/heads/branch3
9d9000d30ab9b5a5a51f9d2e38c135d4fd2c75b6 refs/tags/0.5.1
d0944a8ae00ed44dfb0e382b372d8ef82841b496 refs/tags/0.6.0
e4d308e25d9f6752ee4b2b817c69296672e191b5 refs/tags/0.7.0
So you should be able to do something like this:
git ls-remote --refs '<GIT URL HERE>' '<GIT TAG OR COMMIT HERE>'
The above command only prints the branches/tags which exactly match <GIT TAG OR COMMIT HERE>
You can even specify multiple tags/commits if you want.
So all you have to do is verify that there aren't any refs/heads/
, and that there is exactly zero or one refs/tags/
And it also helpfully gives you the commit for the tag, so you can use that for additional robustness (e.g. if the tag changes while downloading the repo it will still checkout the correct commit).
There might be an easier or faster way of checking, but I think using git ls-remote
should work.
As for downloading a specific commit, it seems that's not possible.
The best you can do seems to be:
git clone --no-checkout '<GIT URL HERE>' foo
cd foo
git checkout '<GIT COMMIT HERE>'
So by using git ls-remote
you can tell whether it's a branch/tag/commit, then:
-
Error if it's a branch
-
Use the fast
clone
if it's a tag -
Use the slow
clone
+checkout
(the above code) if it's an SHA
For GitHub in particular you can efficiently download a specific commit by downloading this URL:
https://github.com/<USERNAME>/<PACKAGE>/archive/<COMMIT>.zip
e.g. https://github.com/purescript/psc-package/archive/e9c81dc641ad845714fca792a48f3de2345f62c3.zip
But that only works for GitHub, not Git repos in general.
Oh, you can also use .tar.gz
rather than .zip
For reference, Nix uses the .tag.gz
URL to fetch from GitHub.
Nix does a fetch
+ checkout
to fetch from non-GitHub git repos. But they also do a lot of other complex stuff too.
Personally, I recommend downloading the .tar.gz
from GitHub when the URL is https://github.com
, and otherwise fall back to clone
+ checkout
Yes, I'd definitely like to use that path to pull things from GitHub, but we can deal with that separately. I like the idea of using ls-remote
to solve this issue.
@Pauan thanks for weighing in. I knew someone with a lot of deep git knowledge would have a solution. I am not sure when I'll have time to work on this again, probably wednesday but @Pauan if you have a workable solution and some free time until then, feel free to fixup my patch. Basically I'm just going to be implementing the commends you recommended anyway.
I discovered git archive
last week, which I wish we could use. While GitLab supports it, looks like GitHub and Bitbucket does not. :( - https://www.gilesorr.com/blog/git-archive-github.html
Here is a list of "tree-ish" things that it should accept.
$ git archive --remote=https://github.com/purescript/purescript-prelude.git --output=./dep-version.tar.gz <tree-ish>
$ git archive --remote=https://github.com/purescript/purescript-prelude.git --output=./dep-version.tar.gz v3.1.0
$ git archive --remote=https://github.com/purescript/purescript-prelude.git --output=./dep-version.tar.gz 6d411827970302769895877511f51f7a414ecbb3
Looks good, thanks for the updates.
I do still have a slight concern that this will add a performance overhead, since running ls-remote
on a repo with many tags does seems to take a short while. The compiler repo has many SHA hashes for example:
$ time git ls-remote -q --refs | wc -l
1687
real 0m0.862s
user 0m0.014s
sys 0m0.016s
Perhaps what we can do is to only look at tags and heads (with -t -h
), and assume the input is a SHA if it's not in either of those lists. I assume branches will show up in the output if we use -h
.
@paf31 I didn't realize that ls-remote
pulls in other things like pull requests as well.
Using git ls-remote --heads --tags --refs
sounds like a good idea, even though it's the same speed.
@Pauan I've implemented your suggestion. Thanks!
@paf31 I think I've addressed the all the issues in your review. Let me know if there's any other concerns.
@paf31 just bumping this to see if you needed anything else.
The code looks great, thanks.
I'd also like to try using git clone -b ...
and falling back to a clone and checkout if the clone fails. I'll look into that now.
Another option, which allows us to avoid the costly ls-remote
lookup, and which I actually prefer, would be to allow URLs like sha://0123456789abcdef
or tag://psc-0.11.6
wherever we currently allow a tag. It's explicit, and it doesn't have to be a breaking change.
What do you both think?
Edit: to be clear, I'm suggesting we look for the prefix sha://
before cloning. Then:
- If it's there, clone the whole repo and checkout the named revision. Verify it actually refers to a commit and not a branch/tag.
- If not, use
git clone -b
, and then verify that we didn't clone a branch.
I think it would make more sense to just use sha:
or tag:
without slashes if we are making up some URL schemes, since the stuff which follows isn't hierarchical. See https://tools.ietf.org/html/rfc2718#section-2.1.2
I'm onboard for the URI scheme thing, but it seems kind of weird that we're still falling back to detecting to a slow method. Would we shed that behavior at some point? If not, then we have to maintain two code paths indefinitely, one of which won't be used by most people because it requires specific action to do so.
I also just found out about git describe
which might be useful here.
Sorry for the long delay. I think we should try to pick this back up.
Since we're making breaking changes anyway, I think we should just use the URL approach, and aim to disallow branches completely.
I'm onboard for the URI scheme thing, but it seems kind of weird that we're still falling back to detecting to a slow method.
Which do you think is the slow method? I think what we use right now should still be possible for tags, no? Only sha://
URLs would be slow since we'd need to do a full clone
.
I think we're aligned. I think I can move forward on this when I have time:
- I'll support
sha://
andtag://
. I like the specificity of those labels and the fact that it becomes clear what is and isn't supported. - To be clear, since we're breaking, we're going to require these schemes to be specified?
I would be fine with that (and yes, we'd need to update the package set), but I'd also be fine with just looking for sha://
and assuming it's a tag if we don't see that prefix (in which case, the package set is fine as it is).
@paf31 I threw up a quick implementation, swapping out the function which calls git ls-remote
for one that implements the behavior we described: checks the scheme in the reference for sha
or tag
and failing that, assumes tag
. Do you have any ideas on how this should be tested?
What's the status on this now? Looks like this was in a good place, but I don't think @hdgarrood 's point on the URL format was taken into account - sha:
rather than sha://
. As :
is not permitted in refs this is still unambiguous.
@nwolverson Just pushed an update. I think I glossed over the part about removing the //
before.
I know there were some changes in who is working on this repo. Any chance of getting this merged? I don't know what is going on with appveyor but it looks unrelated to the changes.
If there are no objections, we should really try to merge this soon, especially since the existing behavior leads to unpleasant surprises.
I don't know why it doesn't let me comment on the testing. I'm pretty sure I ran the commands on a terminal when I originally wrote it. Considering that @hdgarrood pointed out what appears to be a text parsing bug and the travis tests passed, I'm wondering if the test suite is insufficient as is. I feel unprepared to really thoroughly test this PR. I'm trying to figure a good point to hand this work off.
@hdgarrood Just confirmed the bug you spotted with the drop, pushed an update fixing it.
Bumping this again. Possibly tagging @justinwoo
Needs some conflict fixing, but otherwise this is good right? Anyone in disagreement?