uv
uv copied to clipboard
Allow sparse checkouts/path exclusion for git dependencies
We currently have a number of binary files in a tests/data directory which account for 99.9% of the disk usage for a shallow clone (which I believe is already used?), and zero percent of the actual functionality.
If you're installing the contents of this repo into another repo as a git dependency, being able to specify that these paths could be excluded would be really useful.
I think we're blocked on a design for specifying this.
Doesn't this already work?
$ uv pip install "kedro-airflow @ git+https://github.com/kedro-org/kedro-plugins.git@main#subdirectory=kedro-airflow"
Updated https://github.com/kedro-org/kedro-plugins.git (92bf6eb)
Resolved 50 packages in 1.50s
Installed 50 packages in 151ms
...
+ kedro-airflow==0.9.0 (from git+https://github.com/kedro-org/kedro-plugins.git@92bf6ebaa249c9e144fc6fee18a83b76973c13f2#subdirectory=kedro-airflow)
I think that's more targeted at monorepos containing multiple packages. You might want to have a monorepo in a workspace with packages, in development have a sparse checkout of the repo, then install a subset of them somewhere else later.
Say you had the following structure:
.
├── otherpackage
│ ├── pyproject.toml
│ ├── src
│ └── tests
└── package
├── pyproject.toml
├── src
└── tests
Pointing at the otherpackage subdirectory might (I'm not actually sure if it does?) do a sparse checkout to only grab ./otherpackage, but you still inherit a nice chunky tests folder in the process.
There's also something to consider here with how you cache this. Let's say you depend on both packages. You'd potentially only want the repo in the cache once with only the required paths checked out (excluding the two test dirs).