hatch icon indicating copy to clipboard operation
hatch copied to clipboard

Supporting non-root VCS ignore

Open sorcio opened this issue 2 years ago • 26 comments

.gitignore files don't have to live at the root of the working directory:

Patterns read from a .gitignore file in the same directory as the path, or in any parent directory (up to the top-level of the working tree), with patterns in the higher level files being overridden by those in lower level files down to the directory containing the file. These patterns match relative to the location of the .gitignore file.

Hatchling assumes that there is at most one .gitignore file, and that it describes paths relative to the project root. This mismatches the behavior of git in a couple of ways:

  1. If .gitignore is found in any of the ancestors of the project root, the paths are interpreted relative to the wrong path. The issue is also propagated to sdists, because the .gitignore file is copied as-is to the source directory.
Example
my-repo-root
├── .gitignore
└── foobar
    ├── LICENSE.txt
    ├── README.md
    ├── ignore_me
    ├── foobar
    │   ├── __about__.py
    │   └── __init__.py
    ├── pyproject.toml
    └── tests
        └── __init__.py

.gitignore might contain the pattern foobar/ignore_me, which would be effective enough for git. But Hatchling would not be able to apply the pattern to exclude ignore_me. If a file with the same name happened to be in the inner foobar directory, it would be accidentally excluded.

  1. Per-directory .gitignore files are ignored.

I don't use Mercurial anymore, but I imagine that 1 would apply there as well. Mercurial restricts .hgignore files to live at the root of the repo, so there can't be multiple ones. But you could have a project in a sub-directory of a hg repo.

It would be nice if Hatch/Hatchling could do the same thing the VCS does.


I don't know if a solution for 1 would involve, at sdist building time, to rewrite .gitignore to make relative patterns (those with / not at the end) relative to the project directory (or write the modified patterns into the sdist pyproject.toml instead of including the .gitignore file).

Per-directory .gitignores complicate things a bit. It implies that when traversing parents you cannot stop at the first .gitignore you find, because there can be a more general one up above. Traversal should stop when you reach the root of the working tree (i.e. when you find .git). But when you unpack an sdist you don't have a .git.

And I don't even want to start thinking about submodules.

sorcio avatar Jul 02 '22 13:07 sorcio

The current behavior also doesn't account for global .gitignore files, i.e. core.excludesFile and/or ~/.gitconfig.

I haven't thought hard enough but perhaps a guess at a solution would be using git ls-files to collect the list of files and let git manage its own behavior? In Rust there's a whole crate dedicated to all the fun involved in ignore files (used by e.g. fd and rg).

Julian avatar Aug 17 '22 08:08 Julian

Flit uses git ls-files and I can confirm that it more closely matches user expectations. :)

pradyunsg avatar Aug 17 '22 12:08 pradyunsg

I can add more ignore file support but will never use git or hg directly unless behind an option. I won't spend time on an option like that myself but will accept a PR.

Please keep in mind that a VCS checkout is not guaranteed and is in fact the rarest of all situations. If a project is only published on PyPI in CI/CD then that assumption is met but then fails as soon as a user installs the sdist or attempts are made to build for other ecosystems like Conda or Fedora that use either the sdist on PyPI or the tarball release artifact on GitHub.

ofek avatar Aug 17 '22 13:08 ofek

Note the issue is not a feature request to strictly mimic the behavior of git. It describes a misbehavior in specific conditions, i.e. when ignore files are not at the root of the project. The problem mostly appears when using sdists so using git in Hatchling is not a general solution.

Maybe Hatch has more context and is more likely to be used from a VCS checkout. So an option could be added there.

Supporting per-directory ignore files would be an additional feature. But it could be added as part of the same solution.

sorcio avatar Aug 20 '22 16:08 sorcio

FWIW, I've almost shipped a package with a .mypy_cache, because git magically ignored it. No, I don't have a global .gitignore anymore and I've stopped using .git/info/exclude. I have no idea why – maybe git got smart!? But I can't rely on git complaining about files lying around and that's spooky.


Would it be possible to maybe statically add certain dirs that never should be shipped like .DS_Store, .tox, .nox, .mypy_cache .pytest_cache, __pycache__? build and dist also come to mind.

I know you're opposed to magic, but this would reduce the amount of package screwups significantly.

hynek avatar Sep 14 '22 08:09 hynek

https://github.com/pypa/hatch/pull/493

build is a setuptools thing

ofek avatar Sep 18 '22 23:09 ofek

Can I somehow convince you to add .mypy_cache and .pytest_cache too?

Because as I wrote: my Git is ignoring it by default and I suspect I'm not the only one so this can lead to a lot of .[mypy|pytest]_caches shipped since there's no feedback except double-checking the sdists and I suspect I'm 1 of 5 people world-wide who do that.

hynek avatar Sep 21 '22 06:09 hynek

Can you find out how Git is ignoring those? It's not for me

ofek avatar Sep 21 '22 07:09 ofek

Yeah I've found out.

Both directories are created with an .gitignore file inside of them that contains * as an entry.

For mypy you may have to delete the directory and run it again because it doesn't seem like it's adding the files if the dir has been created by an older version.


I guess the trick for both would be to evaluate .gitignore files recursively? It seems like this is become a common pattern (and it's kinda a good idea tbh).

hynek avatar Sep 21 '22 07:09 hynek

This was the only Hatch-related pain point / surprise I encountered while migrating a large codebase (https://github.com/aesara-devs/aesara/pull/1384). It would be great to support recursive .gitignore evaluation as per the previous comment.

maresb avatar Jan 06 '23 12:01 maresb

Given that git ls-files won't be supported here, I wonder if it would make sense as an additional feature in hatch-vcs? Setuptools_scm does provide this listing to setuptools, so it seems like a good fit for that plugin. Not sure how complicated that would be, though...

effigies avatar Mar 31 '23 16:03 effigies

I was hit by this (unintended?) behaviour. By the way I noticed because the source dist was not reproducible, due to the cached files.

For now I add

[tool.hatch.build]
exclude = [
  ".*_cache",
] 

in every pyproject.toml...

While the best option would be to honor .gitignore files also in child directories, I was wondering if it make sense to allow global exclude patterns in config.toml?

miccoli avatar May 19 '23 13:05 miccoli

More sensible default behavior would be desirable here. This bit me too, I don't keep a .gitignore in the sourcetree unless it's excluding stuff generated specifically by that project. Excludes for .venv, pyc etc all go in global gitignore.

In https://github.com/pytest-dev/pytest-freezer/issues/9 it was pointed out that the sdist on PyPI had a whole venv inside. Oops, embarrassing.. :)

wimglenn avatar Jun 17 '23 06:06 wimglenn

For everyone running into these problems I’d like to propose https://github.com/hynek/build-and-inspect-python-package for your consideration to build and check your packages in a clean CI with introspection – and then ideally upload from CI using the new PyPI trusted publisher support.

This bug in Hatch has finally pushed me over the edge to take this on a few months ago and it’s basically just adding a workflow to your repo and a bit of configuration on PyPI’s side: https://github.com/python-attrs/attrs/blob/main/.github/workflows/pypi-package.yml

It’s very liberating.

hynek avatar Jun 17 '23 07:06 hynek

@hynek Thank you for that, really nice. Just tried it out and found a couple of interesting things: it didn't work on a windows runner (meh?), and the PKG-INFO inside the sdist didn't get the source timestamp like all the other files (also meh?)

As for hatch, after trying+failing multiple times in a row to configure contents in the .tar.gz and the .whl correctly, I think I'm giving up on it for the moment. I'm all for explicit configuration, but the default configuration should not do things that are obviously wrong.

wimglenn avatar Jun 21 '23 05:06 wimglenn

@wimglenn What is not working with your explicit configuration? The default file inclusion for each target is well documented:

  • https://hatch.pypa.io/latest/plugins/builder/wheel/#default-file-selection
  • https://hatch.pypa.io/latest/plugins/builder/sdist/#default-file-selection

Based on that your project seems to have perfectly fine defaults for wheels. Therefore, you should not be using the global [tool.hatch.build] config but rather [tool.hatch.build.targets.sdist] specifically.

ofek avatar Jun 21 '23 12:06 ofek

I have to disagree here. It is not a perfectly fine default to ship files to PyPI that aren't part of the repo and aren't shown in git status – with or without a global .gitignore / .git/info/exclude. To me this is Hatch's biggest footgun by far.

hynek avatar Jun 21 '23 12:06 hynek

I'm not sure where that response came from? I was talking specifically about that user's project.

ofek avatar Jun 21 '23 13:06 ofek

I'm sorry, maybe I've misunderstood you both.

hynek avatar Jun 21 '23 14:06 hynek

Just because I'm noticing this now,

Given that git ls-files won't be supported here

I'm not sure where this comes from,from what I see the last @ofek opined on the matter was

I can add more ignore file support but will never use git or hg directly unless behind an option. I won't spend time on an option like that myself but will accept a PR.

so AFAICT, @ofek correct me if I'm wrong, but the status of this issue is still "PR welcome to add an option, at least for part of the problem"?

Julian avatar Jun 21 '23 14:06 Julian

FWIW I have a PoC patch against hatch-vcs that uses setuptools_scm to determine the file listing from git/hg: https://github.com/ofek/hatch-vcs/pull/40

It seems to be held up by a setuptools_scm refactor, but it can be used now.

effigies avatar Jun 21 '23 14:06 effigies

so AFAICT, @ofek correct me if I'm wrong, but the status of this issue is still "PR welcome to add an option, at least for part of the problem"?

that is correct

ofek avatar Jun 21 '23 16:06 ofek

@ofek There is nothing wrong with that explicit configuration: when using [tool.hatch.build.targets.sdist] it does the correct thing.

My complaint was that the default configuration is not clever enough to know that packing .venv and pycs inside the sdist is obviously wrong. Then I've fumbled, trying to fix the sdist and inadvertently messed up the wheel and hatch doesn't know that putting site-packages/pyproject.toml and site-packages/README.rst etc in the wheel is also obviously wrong.

User error? Yes. But also perhaps a bit too easy to make mistakes without noticing.

wimglenn avatar Jun 21 '23 21:06 wimglenn

If .gitignore is found in any of the ancestors of the project root, the paths are interpreted relative to the wrong path. The issue is also propagated to sdists, because the .gitignore file is copied as-is to the source directory.

I've been scratching my head at something related to this... I think.

I'm working in a monorepo where subpackages live under a libs/<subpackage> directory until my repo root. At the repo root, I have a .gitignore and one of the subpackages also has a .gitignore. In a different subpackage, I have the following in my pyproject.toml file:

[tool.hatch.build.targs.sdist]
exclude = [
    '.gitignore',
]

but when I run hatch build from the subpackage directory, the top level .gitignore file always shows up in my sdist. Since there's no .gitignore in this subpackage, I didn't understand why the root .gitignore got added to my .tar.gz sdist file.

FTR, I've tried various combinations of exclude patterns and also tried [tool.hatch.build] section but that top level .gitignore always gets added.

Maybe that's intended behavior based on the docs note that .gitignore cannot be excluded, but it was certainly confusing that a file from outside my subpackage directory -- where I explicitly run hatch build from -- gets copied into my sdist.

warsaw avatar Sep 07 '23 21:09 warsaw

when I run hatch build from the subpackage directory, the top level .gitignore file always shows up in my sdist. Since there's no .gitignore in this subpackage, I didn't understand why the root .gitignore got added to my .tar.gz sdist file.

Yes, it's the same issue. It's not literally that the file is copied (so it doesn't matter if it's excluded). Hatch reads the rules, and applies them to a newly-generated .gitignore file. The effect is the same. There are at least a couple issues with this, discussed above: the behavior is surprising; and rules meant for a different path are applied without correction, so it's also wrong.


Although I originally reported this, I'm unlikely to work on this issue because I'm not using Hatch anymore.

sorcio avatar Sep 08 '23 08:09 sorcio