uv
uv copied to clipboard
Conditional Git LFS Support
Summary
Follow up to https://github.com/astral-sh/uv/pull/15563 Closes https://github.com/astral-sh/uv/issues/13485
This is a first-pass at adding support for conditional support for Git LFS between git sources, initial feedback welcome.
e.g.
[tool.uv.sources]
test-lfs-repo = { git = "https://github.com/zanieb/test-lfs-repo.git", lfs = true }
For context previously a user had to set UV_GIT_LFS to have uv fetch lfs objects on git sources. This env var was all or nothing, meaning you must always have it set to get consistent behavior and it applied to all git sources. If you fetched lfs objects at a revision and then turned off lfs (or vice versa), the git db, corresponding checkout lfs artifacts would not be updated properly. Similarly, when git source distributions were built, there would be no distinction between sources with lfs and without lfs. Hence, it could corrupt the git, sdist, and archive caches.
In order to support some sources being LFS enabled and other not, this PR adds a stateful layer roughly similar to how subdirectory works but for lfs since the git database, the checkouts and the corresponding caching layers needed to be LFS aware (requested vs installed). The caches also had to isolated and treated entirely separate when handling LFS sources.
Summary
- Adds
lfs = trueorlfs = falseto git sources in pyproject.toml - Added
lfs=truequery param / fragments to most relevant url structs (not parsed as user input)- In the case of uv add / uv tool,
--lfsis supported instead UV_GIT_LFSenvironment variable support is still functional for non-project entrypoints (e.g. uv pip)
- In the case of uv add / uv tool,
direct-url.jsonnow has an customgit_lfsentry under VcsInfo (note, this is not in the spec currently -- see caveats).- git database and checkouts have an different cache key as the sources should be treated effectively different for the same rev.
- sdists cache also differ in the cache key of a built distribution if it was built using LFS enabled revisions to distinguish between non-LFS same revisions. This ensures the strong assumption for archive-v0 that an unpacked revision "doesn't change sources" stays valid.
Caveats
pylock.tomlimport support has not been added via git_lfs=true, going through the spec it wasn't clear to me it's something we'd support outside of the env var (for now).- direct-url struct was modified by adding a non-standard
git_lfsfield under VcsInfo which may be undersirable although the PEP 610 does sayAdditional fields that would be necessary to support such VCS SHOULD be prefixed with the VCS command namewhich could be interpret this change as ok. - There will be a slight lockfile and cache churn for users that use
UV_GIT_LFSas all git lockfile entries will get alfs=truefragment. The cache version does not need an update, but LFS sources will get their own namespace under git-v0 and sdist-v9/git hence a cache-miss will occur once but this can be sufficient to label this as breaking for workflows always settingUV_GIT_LFS.
Test Plan
Some initial tests were added. More tests likely to follow as we reach consensus on a final approach.
For IT test, we may want to move to use a repo under astral namespace in order to test lfs functionality.
Manual testing was done for common pathological cases like killing LFS fetch mid-way, uninstalling LFS after installing an sdist with it and reinstalling, fetching LFS artifacts in different commits, etc.
PSA: Please ignore the docker build failures as its related to depot OIDC issues.
CodSpeed Performance Report
Merging #16143 will not alter performance
Comparing samypr100:conditional_lfs_support (dc1b6ba) with main (5947fb0)
Summary
✅ 6 untouched
@zanieb This should be good to go. Given the size, how would you like to approach reviewing this best? I can also do a walkthrough of any of the changes and their rationale sync or async if desired. I tried to keep the commits idependently reviewable, although GH doesn't make it easy to review that way.
I can't comment where the tests are run, but we should enable the git-lfs tests where we run the git test features, in cargo-test-macos.
I can't comment where the tests are run, but we should enable the
git-lfstests where we run thegittest features, incargo-test-macos.
Done in e8b48e6
I'm taking a look at this in a little more detail. I really want to avoid churning the cache because that has an impact on non-LFS git users, and I think we can do that safely.
So I would request:
- In the cache key used for
db, do not record LFS-ness at all; keep the cache key the same as it was before this PR.- In the cache key used for
checkouts, record both LFS-ness and non-LFS-ness (as this PR currently does).
I believe I had this before on one of the original iterations to avoid the cache churn. Are we settled this is what we want?
also, thoughts on this other location per your request?
https://github.com/astral-sh/uv/blob/e2b1a8969407f1f7440706be4bcb4d8f79df28d0/crates/uv-distribution-types/src/requirement.rs#L446-L450
edit: moved discussion to https://github.com/astral-sh/uv/pull/16143#discussion_r2525750175
There is one theoretical backwards-incompatibility here: with current uv, you only have to specify UV_GIT_LFS=1 once and the checkout and built wheel gets cached with the LFS artifacts included. This PR (correctly) identifies that as a risk to cache consistency, but I think in practice it may work fine for users with git-lfs installed and configured properly, because the .ok file won't get created if the LFS fetch fails or otherwise leaves you in an inconsistent state, and I am worried that people might be relying on that in practice.
With current uv:
$ UV_CACHE_DIR=/tmp/cache/q2-old UV_GIT_LFS=1 uvx git+https://github.com/astral-sh/lfs-cowsay -v
Updated https://github.com/astral-sh/lfs-cowsay (44b8e65c8ecd376e4482a68b29312227879a226d)
Built lfs-cowsay @ git+https://github.com/astral-sh/lfs-cowsay@44b8e65c8ecd376e4482a68b29312227879a226d
Installed 1 package in 2ms
6.1
$ UV_CACHE_DIR=/tmp/cache/q2-old uvx git+https://github.com/astral-sh/lfs-cowsay -v
6.1
With the code from this PR:
$ UV_CACHE_DIR=/tmp/cache/q2-new UV_GIT_LFS=1 target/debug/uvx git+https://github.com/astral-sh/lfs-cowsay -v
Updated https://github.com/astral-sh/lfs-cowsay (44b8e65c8ecd376e4482a68b29312227879a226d)
Built lfs-cowsay @ git+https://github.com/astral-sh/lfs-cowsay@44b8e65c8ecd376e4482a68b29312227879a226d#lfs=true
Installed 1 package in 19ms
6.1
$ UV_CACHE_DIR=/tmp/cache/q2-new target/debug/uvx git+https://github.com/astral-sh/lfs-cowsay -v
Built lfs-cowsay @ git+https://github.com/astral-sh/lfs-cowsay@44b8e65c8ecd376e4482a68b29312227879a226d
Installed 1 package in 18ms
Traceback (most recent call last):
File "/tmp/cache/q2-new/archive-v0/UHbaC9QcSG9WzgquNtN5V/bin/lfs-cowsay", line 6, in <module>
from lfs_cowsay.__main__ import cli
File "/tmp/cache/q2-new/archive-v0/UHbaC9QcSG9WzgquNtN5V/lib/python3.14/site-packages/lfs_cowsay/__init__.py", line 4, in <module>
from .main import (
...<13 lines>...
)
File "/tmp/cache/q2-new/archive-v0/UHbaC9QcSG9WzgquNtN5V/lib/python3.14/site-packages/lfs_cowsay/main.py", line 4, in <module>
from .characters import CHARS
File "/tmp/cache/q2-new/archive-v0/UHbaC9QcSG9WzgquNtN5V/lib/python3.14/site-packages/lfs_cowsay/characters.py", line 2
oid sha256:4a003717d1e1997c56ef4128f54eb92c72831e44cda4e3d49e93a32ae0cc9222
^
SyntaxError: invalid decimal literal
(This does not affect UV_GIT_LFS=1 uv tool install blah + uv tool upgrade, though - we correctly persist the ?lfs=true in the URL. It only affects two independent parses of the same git repo/commit that do not share persistent state but share the cache.)
Again, I don't think this is a problem in practice / don't think this should block merging.
Thank you! Having worked with LFS, and I wonder how to deal with different LFS endpoints and credentials? For example, at work we have GitHub enterprise and external git LFS server (not github)
Thank you! Having worked with LFS, and I wonder how to deal with different LFS endpoints and credentials? For example, at work we have GitHub enterprise and external git LFS server (not github)
@Red-Eyed That would still be configured as part of your git configuration. Since LFS objects have the originating url on them, as long as git itself can authenticate, fetch, and materialize the objects you shouldn't have issues with uv.