cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Apply GitHub fast path even for partial hashes

Open dtolnay opened this issue 2 years ago • 4 comments

What does this PR try to resolve?

As flagged in https://github.com/rust-lang/cargo/pull/10079#issuecomment-1170940132, it's not great to assume that git SHAs would always be 40-character hex strings. In the future they will be longer.

Git is on a long-term trajectory to move to SHA256 hashes (current status). I suppose when that becomes available/the default it's possible for a 40-digit hex-encoded hash not to be the full hash. Will this fail for that case?

The implementation from #10079 fails in that situation because it turns dependencies of the form { git = "…", rev = "[…40 hex…]" } into fetches with a refspec +[…40 hex…]:refs/commit/[…40 hex…]. That only works if the 40 hex digits are the full long hash of your commit. If it's really a prefix ("short hash") of a 64-hex-digit SHA-256 commit hash, you'd get a failure that resembles:

error: failed to get `dependency` as a dependency of package `repro v0.0.0`

Caused by:
  failed to load source for dependency `dependency`

Caused by:
  Unable to update https://github.com/rust-lang/dependency?rev=b30694b4d9b29141298870b7993e9aee10940524

Caused by:
  revspec 'b30694b4d9b29141298870b7993e9aee10940524' not found; class=Reference (4); code=NotFound (-3)

This PR updates the implementation so that Cargo will curl GitHub to get a resolved long commit hash even if the rev specified for the git dependency in Cargo.toml already looks like a SHA-1 long hash.

Performance considerations

⛔ This reverses a (questionable, negligible) benefit of #10079 of skipping the curl when rev is a long hash and is not already present in the local clone. These curls take 200-250ms on my machine.

🟰 We retain the much larger benefit of #10079 which comes from being able to precisely fetch a single rev, instead of fetching all branches and tags in the upstream repo and hoping to find the rev somewhere in there. This accounts for the entire performance difference explained in the summary of that PR.

🟰 We still skip the curl when rev is a long hash of a commit that is already previously fetched.

🥳 After this PR, we also curl and hit fast path when rev is a short hash of some upstream commit. For example { git = "https://github.com/rust-lang/cargo", rev = "b30694b4d9" } would previously have done the download-all-branches-and-tags codepath because b30694b4d9 is not a long hash. After this PR, the curl to GitHub informs us that b30694b4d9 resolves to the long hash b30694b4d9b29141298870b7993e9aee10940524, and we download just that commit instead of all-branches-and-tags.

How should we test and review this PR?

I tested with the following dependency specs, using /path/to/target/release/cargo generate-lockfile.

# Before: slow path (fetch all branches and tags; 70K git objects)
# After: fast path (20K git objects)
cargo = { git = "https://github.com/rust-lang/cargo", rev = "b30694b4d9b2914129" }
# Before and after: fast path
cargo = { git = "https://github.com/rust-lang/cargo", rev = "b30694b4d9b29141298870b7993e9aee10940524" }
# Before and after: fast path
cargo = { git = "https://github.com/rust-lang/cargo", rev = "refs/heads/rust-1.14.0" }
# Before and after: same error "revspec 'rust-1.14.0' not found"
# You are supposed to use `branch = "rust-1.14.0"`, this is not considered a `rev`
cargo = { git = "https://github.com/rust-lang/cargo", rev = "rust-1.14.0" }

I made sure these all work both with and without rm -rf ~/.cargo/git/db/cargo-* ~/.cargo/git/checkouts/cargo-* in between each cargo invocation.

FYI @djc

dtolnay avatar Jun 30 '22 21:06 dtolnay

r? @ehuss

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive avatar Jun 30 '22 21:06 rust-highfive

I find the questions of longer SHAs confusing and a little premature. What is the timeline for git to support them? GitHub? LibGit2? What is the time pressure on Cargo? Are other aspects of Cargo ready for this change?

However, when looking at the code this seems like a much smaller change. We now read the result of https://api.github.com/repos/{}/{}/commits/{}, which we had previously just been querying for the status code, and use that for the follow-up fast path to GitHub. This maximizes the chance that we can use GitHubs fast path.

Eh2406 avatar Jul 21 '22 16:07 Eh2406

This provides some reading on sha-256 progress in git: https://lwn.net/Articles/898522/

est31 avatar Jul 21 '22 17:07 est31

Yeah this PR was poorly titled. This from the PR description is the actual main benefit, which SHA-256 is just a special case of:

🥳 After this PR, we also curl and hit fast path when rev is a short hash of some upstream commit. For example { git = "https://github.com/rust-lang/cargo", rev = "b30694b4d9" } would previously have done the download-all-branches-and-tags codepath because b30694b4d9 is not a long hash. After this PR, the curl to GitHub informs us that b30694b4d9 resolves to the long hash b30694b4d9b29141298870b7993e9aee10940524, and we download just that commit instead of all-branches-and-tags.

dtolnay avatar Jul 21 '22 17:07 dtolnay

@rfcbot merge

This pull request reverts the benefits of #10079 of skipping curl for 40-characters hex.

OTOH it becomes future-proof when Git switches to SHA256, and also gains the benefit of skipping fetches for partial hashes when local objects already exist.

weihanglo avatar Aug 16 '22 21:08 weihanglo

Team member @weihanglo has proposed to merge this. The next step is review by the rest of the tagged team members:

  • [x] @Eh2406
  • [x] @ehuss
  • [x] @epage
  • [ ] @joshtriplett
  • [x] @weihanglo

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot avatar Aug 16 '22 21:08 rfcbot

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot avatar Aug 17 '22 01:08 rfcbot

I feel like it's no harm to end the FCP earlier. Going to merge it now.

@bors r+

weihanglo avatar Aug 24 '22 07:08 weihanglo

:pushpin: Commit 36053de4adf85a4177806220ce7c01bd97797124 has been approved by weihanglo

It is now in the queue for this repository.

bors avatar Aug 24 '22 07:08 bors

:hourglass: Testing commit 36053de4adf85a4177806220ce7c01bd97797124 with merge 9ded34a558a900563b0acf3730e223c649cf859d...

bors avatar Aug 24 '22 07:08 bors

:sunny: Test successful - checks-actions Approved by: weihanglo Pushing 9ded34a558a900563b0acf3730e223c649cf859d to master...

bors avatar Aug 24 '22 07:08 bors