pip bazaar: Use lightweight checkouts rather than a full branch clone

bazaar: Use lightweight checkouts rather than a full branch clone

Open jelmer opened this issue 1 year ago • 12 comments

Fixes #5444

For Bazaar itself, this changes the clone time from 2 minutes to 11 seconds for me.

Jul 15 '22 20:07 jelmer

Hi,

I'm personally not familiar with bazaar at all anymore, but I'll note that when these kind of solutions where showing up for git, they were rejected because some use cases rely on having the history available in the cloned repo (e.g. for tools such as setuptools_scm).

Could this be an issue for bazaar users too ?

Jul 16 '22 10:07 sbidoul

Hi,

I'm personally not familiar with bazaar at all anymore, but I'll note that when these kind of solutions where showing up for git, they were rejected because some use cases rely on having the history available in the cloned repo (e.g. for tools such as setuptools_scm).

Could this be an issue for bazaar users too ?

That's not the case here - you can still access the full history in a lightweight checkout, but it might trigger network activity to retrieve the missing data from the main copy. Even if that is necessary, that's probably going to be faster for things like setuptools-scm since they just need the revision graph - not the contents.

Jul 16 '22 10:07 jelmer

Ah, that should be ok then, and similar to what we have done for git with --filter=blob:none. Thanks for the clarification.

Jul 16 '22 10:07 sbidoul

And, are the difference between branch and checkout significant? And between update and pull? Are there any backward compatibility concerns? What if (for instance) there is an existing editable directory that was obtained with a previous pip version with bzr branch. Will a bzr update work on it?

Jul 20 '22 10:07 sbidoul

This is not backwards compatible, "bzr up" is a noop for "bzr branch" type clones. That's easy to address though, by adding a "bzr bind" command to convert from a branch to a checkout. I'll update this branch.

Jul 20 '22 11:07 jelmer

This is not backwards compatible, "bzr up" is a noop for "bzr branch" type clones. That's easy to address though, by adding a "bzr bind" command to convert from a branch to a checkout. I'll update this branch.

Now updated to call "bzr bind" to make it compatible with older branches (and cope with location changes).

Aug 17 '22 20:08 jelmer

Hi @sbidoul , would you perhaps be able to help review this branch after the last changes or suggest a course of action? Thanks!

Sep 03 '22 20:09 jelmer

would you perhaps be able to help review this branch after the last changes or suggest a course of action? Thanks!

Hi, I've had a look at your latest version. It's hard for me to tell as I know next to nothing about bzr. I'm still a bit worried about backward compatibility, though. As checkout and branch don't have the same effect, would it be possible to do something just after checkout so it becomes more or less equivalent to a bzr branch?

Sep 04 '22 20:09 sbidoul

would you perhaps be able to help review this branch after the last changes or suggest a course of action? Thanks!

Hi, I've had a look at your latest version. It's hard for me to tell as I know next to nothing about bzr. I'm still a bit worried about backward compatibility, though. As checkout and branch don't have the same effect, would it be possible to do something just after checkout so it becomes more or less equivalent to a bzr branch?

It seems hard to find somebody in pypa who is fairly with bzr, so I appreciate you taking a look. :-) If it helps, I'm one of the upstream developers for Bazaar.

My changes are backwards compatible. If there is currently an "unbound" branch (i.e. something created by "bzr branch") present, then it will convert it to a bound branch (the same thing "bzr checkout" will create) by calling "bzr bind" first before running "bzr update".

% bzr bind --help
Purpose: Convert the current branch into a checkout of the supplied branch.
Usage:   brz bind [LOCATION]

...

Sep 04 '22 20:09 jelmer

My changes are backwards compatible now

I'm not sure. If you look at the obtain method in versioncontrol.py, it does fetch_new if the directory does not exist yet, and update if it does. With this PR fetch_new now does a bzr checkout where it did a bzr branch before, so the result of the first obtain will be in a different state than before.

That's why I'm asking if something can be done in fetch_new to convert the checkout into a branch.

Also, if I read the help of bzr bind I'm confused since it says Convert the current branch into a checkout of the supplied branch. I would think we rather need to convert the checkout into a branch. But again I don't use bzr and I'm probably missing something.

Sep 04 '22 20:09 sbidoul

My changes are backwards compatible now

I'm not sure. If you look at the obtain method in versioncontrol.py, it does fetch_new if the directory does not exist yet, and update if it does. With this PR fetch_new now does a bzr checkout where it did a bzr branch before, so the result of the first obtain will be in a different state than before.

Yup, that's fine. You'll end up with a checkout after fetch_new now. And if there's already a branch rather than a checkout there (as created by fetch_new in older version of pip), then a subsequent call to update() will convert the standalone branch to a checkout because it's calling "bzr bind".

That's why I'm asking if something can be done in fetch_new to convert the checkout into a branch.

Also, if I read the help of bzr bind I'm confused since it says Convert the current branch into a checkout of the supplied branch. I would think we rather need to convert the checkout into a branch. But again I don't use bzr and I'm probably missing something.

We specifically want to end up with a "checkout" rather than a standalone branch (created by "bzr branch"), since a standalone branch carries full history and is thus slow to create. A checkout is a local thing that is meant to merely mirror a remote branch (tying it to that remote branch is what "bzr bind" does); it will fetch things from its "master branch" when it needs to and therefor it doesn't need to have a full copy of history itself.

Or running these as bzr commands:

# Clone hydrazine, obtaining the second to last revision (so we have something to update to later)
# This is similar to what we'd be left by an older version of pip
/tmp % bzr branch -2 lp:hydrazine
Branched 106 revisions.                                                                                                                                                                    
# It's a standalone branch:
/tmp % bzr info hydrazine
Standalone tree (format: 2a)
Location:
  branch root: hydrazine

Related branches:
  parent branch: bzr+ssh://bazaar.launchpad.net/+branch/hydrazine/

# Bind it to the upstream branch
/tmp % cd hydrazine
/tmp/hydrazine % bzr bind lp:hydrazine
# It's now become a checkout
/tmp/hydrazine % bzr info 
Checkout (format: 2a)
Location:
       checkout root: .
  checkout of branch: bzr+ssh://bazaar.launchpad.net/+branch/hydrazine/
# update to the latest revision:
/tmp/hydrazine % bzr up 
 M  README                                                                                                                                                                                 
 M  bugclient
 M  capture-bug-counts
 M  check-membership
 M  feed-pqm
 M  lp-attach
 M  lp-delete-ppa-packages
 M  lp-promote-ppa
 M  scan-merge-proposals
 M  setup.py
All changes applied successfully.                                                                                                                                                          
Updated to revision 107 of branch bzr+ssh://bazaar.launchpad.net/+branch/hydrazine

Sep 04 '22 21:09 jelmer

I've added a comment in the code explaining why it's calling "bzr bind".

Sep 04 '22 22:09 jelmer

I propose we merge this. @sbidoul, as you previously commented here, are you OK with that?

@jelmer can you look into the CI failure?

Oct 07 '22 17:10 pfmoore

Yup, sorry I had lost track of this.

I still think there might be a minor backward incompatibility concern with pip install -e bzr+https://.... IIUC, what people will get in their src directory will not be the same as before and that might break some workflows.

But that's probably not worth worrying about too much. So :+1: for merging.

Oct 07 '22 18:10 sbidoul

The CI failure is our flaky test_timeout, so unrelated.

Oct 07 '22 18:10 sbidoul

The CI failure is our flaky test_timeout, so unrelated.

Sorry, I misread because github actions highlighted these lines as they have "ERROR" in them.

Error: ib/test_lib.py::TestPipTestEnvironment::test_run__allow_stderr_error[ERROR] 
Error: 36m [ 94%] PASSED tests/lib/test_lib.py::TestPipTestEnvironment::test_run__allow_stderr_error[ERROR]

Oct 07 '22 18:10 pfmoore

I've force-merged, since reverts are easy-enough. :)

Oct 07 '22 18:10 pradyunsg

Beat me to it, @pradyunsg! 🚤

Oct 07 '22 18:10 pfmoore

Heh, good point @pfmoore -- I'll revert the force-merge if it causes main to break.

Oct 07 '22 18:10 pradyunsg

The CI failure is our flaky test_timeout, so unrelated.

Off-topic for here, but is anyone looking at test_timeout, or do we have an issue tracking the problem? Presumably it's this one?

Checking the history, it looks like we originally skipped the test deliberately because it was flaky, but it was "fixed" in https://github.com/pypa/pip/commit/426691dc1471a4d565f4631731a40fb7dea7381f by an over-enthusiastic response to a mypy message (skipif used without an argument).

I propose we revert the marker change, making it skipif(True) to placate mypy, but leaving the test skipped.

Oct 07 '22 18:10 pfmoore

I think we should just remove the test -- it's flaky enough that it's basically rendering our Windows CI useless. :)

Oct 07 '22 18:10 pradyunsg

pip pip copied to clipboard

bazaar: Use lightweight checkouts rather than a full branch clone

pip
pip copied to clipboard