uv icon indicating copy to clipboard operation
uv copied to clipboard

Conditional Git LFS Support

Open samypr100 opened this issue 2 months ago • 7 comments

Summary

Follow up to https://github.com/astral-sh/uv/pull/15563 Closes https://github.com/astral-sh/uv/issues/13485

This is a first-pass at adding support for conditional support for Git LFS between git sources, initial feedback welcome.

e.g.

[tool.uv.sources]
test-lfs-repo = { git = "https://github.com/zanieb/test-lfs-repo.git", lfs = true }

For context previously a user had to set UV_GIT_LFS to have uv fetch lfs objects on git sources. This env var was all or nothing, meaning you must always have it set to get consistent behavior and it applied to all git sources. If you fetched lfs objects at a revision and then turned off lfs (or vice versa), the git db, corresponding checkout lfs artifacts would not be updated properly. Similarly, when git source distributions were built, there would be no distinction between sources with lfs and without lfs. Hence, it could corrupt the git, sdist, and archive caches.

In order to support some sources being LFS enabled and other not, this PR adds a stateful layer roughly similar to how subdirectory works but for lfs since the git database, the checkouts and the corresponding caching layers needed to be LFS aware (requested vs installed). The caches also had to isolated and treated entirely separate when handling LFS sources.

Summary

  • Adds lfs = true or lfs = false to git sources in pyproject.toml
  • Added lfs=true query param / fragments to most relevant url structs (not parsed as user input)
    • In the case of uv add / uv tool, --lfs is supported instead
    • UV_GIT_LFS environment variable support is still functional for non-project entrypoints (e.g. uv pip)
  • direct-url.json now has an custom git_lfs entry under VcsInfo (note, this is not in the spec currently -- see caveats).
  • git database and checkouts have an different cache key as the sources should be treated effectively different for the same rev.
  • sdists cache also differ in the cache key of a built distribution if it was built using LFS enabled revisions to distinguish between non-LFS same revisions. This ensures the strong assumption for archive-v0 that an unpacked revision "doesn't change sources" stays valid.

Caveats

  • pylock.toml import support has not been added via git_lfs=true, going through the spec it wasn't clear to me it's something we'd support outside of the env var (for now).
  • direct-url struct was modified by adding a non-standard git_lfs field under VcsInfo which may be undersirable although the PEP 610 does say Additional fields that would be necessary to support such VCS SHOULD be prefixed with the VCS command name which could be interpret this change as ok.
  • There will be a slight lockfile and cache churn for users that use UV_GIT_LFS as all git lockfile entries will get a lfs=true fragment. The cache version does not need an update, but LFS sources will get their own namespace under git-v0 and sdist-v9/git hence a cache-miss will occur once but this can be sufficient to label this as breaking for workflows always setting UV_GIT_LFS.

Test Plan

Some initial tests were added. More tests likely to follow as we reach consensus on a final approach.

For IT test, we may want to move to use a repo under astral namespace in order to test lfs functionality.

Manual testing was done for common pathological cases like killing LFS fetch mid-way, uninstalling LFS after installing an sdist with it and reinstalling, fetching LFS artifacts in different commits, etc.

PSA: Please ignore the docker build failures as its related to depot OIDC issues.

samypr100 avatar Oct 06 '25 21:10 samypr100

CodSpeed Performance Report

Merging #16143 will not alter performance

Comparing samypr100:conditional_lfs_support (dc1b6ba) with main (5947fb0)

Summary

✅ 6 untouched

codspeed-hq[bot] avatar Oct 17 '25 22:10 codspeed-hq[bot]

@zanieb This should be good to go. Given the size, how would you like to approach reviewing this best? I can also do a walkthrough of any of the changes and their rationale sync or async if desired. I tried to keep the commits idependently reviewable, although GH doesn't make it easy to review that way.

samypr100 avatar Oct 30 '25 19:10 samypr100

I can't comment where the tests are run, but we should enable the git-lfs tests where we run the git test features, in cargo-test-macos.

konstin avatar Nov 10 '25 15:11 konstin

I can't comment where the tests are run, but we should enable the git-lfs tests where we run the git test features, in cargo-test-macos.

Done in e8b48e6

samypr100 avatar Nov 10 '25 19:11 samypr100

I'm taking a look at this in a little more detail. I really want to avoid churning the cache because that has an impact on non-LFS git users, and I think we can do that safely.

geofft avatar Nov 13 '25 12:11 geofft

So I would request:

  • In the cache key used for db, do not record LFS-ness at all; keep the cache key the same as it was before this PR.
  • In the cache key used for checkouts, record both LFS-ness and non-LFS-ness (as this PR currently does).

I believe I had this before on one of the original iterations to avoid the cache churn. Are we settled this is what we want?

also, thoughts on this other location per your request?

https://github.com/astral-sh/uv/blob/e2b1a8969407f1f7440706be4bcb4d8f79df28d0/crates/uv-distribution-types/src/requirement.rs#L446-L450

edit: moved discussion to https://github.com/astral-sh/uv/pull/16143#discussion_r2525750175

samypr100 avatar Nov 13 '25 16:11 samypr100

There is one theoretical backwards-incompatibility here: with current uv, you only have to specify UV_GIT_LFS=1 once and the checkout and built wheel gets cached with the LFS artifacts included. This PR (correctly) identifies that as a risk to cache consistency, but I think in practice it may work fine for users with git-lfs installed and configured properly, because the .ok file won't get created if the LFS fetch fails or otherwise leaves you in an inconsistent state, and I am worried that people might be relying on that in practice.

With current uv:

$ UV_CACHE_DIR=/tmp/cache/q2-old UV_GIT_LFS=1 uvx git+https://github.com/astral-sh/lfs-cowsay -v
    Updated https://github.com/astral-sh/lfs-cowsay (44b8e65c8ecd376e4482a68b29312227879a226d)
      Built lfs-cowsay @ git+https://github.com/astral-sh/lfs-cowsay@44b8e65c8ecd376e4482a68b29312227879a226d
Installed 1 package in 2ms
6.1
$ UV_CACHE_DIR=/tmp/cache/q2-old uvx git+https://github.com/astral-sh/lfs-cowsay -v
6.1

With the code from this PR:

$ UV_CACHE_DIR=/tmp/cache/q2-new UV_GIT_LFS=1 target/debug/uvx git+https://github.com/astral-sh/lfs-cowsay -v
    Updated https://github.com/astral-sh/lfs-cowsay (44b8e65c8ecd376e4482a68b29312227879a226d)
      Built lfs-cowsay @ git+https://github.com/astral-sh/lfs-cowsay@44b8e65c8ecd376e4482a68b29312227879a226d#lfs=true
Installed 1 package in 19ms
6.1
$ UV_CACHE_DIR=/tmp/cache/q2-new target/debug/uvx git+https://github.com/astral-sh/lfs-cowsay -v
      Built lfs-cowsay @ git+https://github.com/astral-sh/lfs-cowsay@44b8e65c8ecd376e4482a68b29312227879a226d
Installed 1 package in 18ms
Traceback (most recent call last):
  File "/tmp/cache/q2-new/archive-v0/UHbaC9QcSG9WzgquNtN5V/bin/lfs-cowsay", line 6, in <module>
    from lfs_cowsay.__main__ import cli
  File "/tmp/cache/q2-new/archive-v0/UHbaC9QcSG9WzgquNtN5V/lib/python3.14/site-packages/lfs_cowsay/__init__.py", line 4, in <module>
    from .main import (
    ...<13 lines>...
    )
  File "/tmp/cache/q2-new/archive-v0/UHbaC9QcSG9WzgquNtN5V/lib/python3.14/site-packages/lfs_cowsay/main.py", line 4, in <module>
    from .characters import CHARS
  File "/tmp/cache/q2-new/archive-v0/UHbaC9QcSG9WzgquNtN5V/lib/python3.14/site-packages/lfs_cowsay/characters.py", line 2
    oid sha256:4a003717d1e1997c56ef4128f54eb92c72831e44cda4e3d49e93a32ae0cc9222
               ^
SyntaxError: invalid decimal literal

(This does not affect UV_GIT_LFS=1 uv tool install blah + uv tool upgrade, though - we correctly persist the ?lfs=true in the URL. It only affects two independent parses of the same git repo/commit that do not share persistent state but share the cache.)

Again, I don't think this is a problem in practice / don't think this should block merging.

geofft avatar Nov 18 '25 19:11 geofft

Thank you! Having worked with LFS, and I wonder how to deal with different LFS endpoints and credentials? For example, at work we have GitHub enterprise and external git LFS server (not github)

Red-Eyed avatar Dec 03 '25 06:12 Red-Eyed

Thank you! Having worked with LFS, and I wonder how to deal with different LFS endpoints and credentials? For example, at work we have GitHub enterprise and external git LFS server (not github)

@Red-Eyed That would still be configured as part of your git configuration. Since LFS objects have the originating url on them, as long as git itself can authenticate, fetch, and materialize the objects you shouldn't have issues with uv.

samypr100 avatar Dec 03 '25 14:12 samypr100