black icon indicating copy to clipboard operation
black copied to clipboard

Incorrectly ignored files when `subfolder/.gitignore` contains */*

Open aaossa opened this issue 3 years ago • 4 comments

Describe the bug

When a subfolder contains a .gitignore file with the special rule */* (ignore subdirs contents) Black incorrectly ignores any file at the same level of the .gitignore.

To Reproduce

A minimal git repository:

.
└── repository/
    ├── .git/
    └── subdir/
        ├── .gitignore  # Contains */*
        └── main.py

Any change in the contents of main.py will be "seen" by Git but not Black, resulting in "No Python files are present to be formatted. Nothing to do 😴" in the following cases:

  • black . (from repository/)
  • black subdir/ (from repository/)
  • black . (from repository/subdir/)

It works as expected when SRC is the path to the file:

  • black subdir/main.py (from repository/)
  • black main.py (from repository/subdir/)

When using black . --verbose, the output confirms that the file is ignored because of the .gitignore file:

> black . --verbose
Identified `/repository` as project root containing a .git directory.
Sources to be formatted: "."
.git ignored: matches the --exclude regular expression
subdir/.gitignore ignored: matches the .gitignore file content
subdir/main.py ignored: matches the .gitignore file content
No Python files are present to be formatted. Nothing to do 😴

Notice that files in other folders of the repository are targeted as expected. Only the folder that contains the .gitignore file has its contents incorrectly ignored.

Also, the following repository structure works as expected for both Git and Black in any case, which confirms that the problem comes when the .gitignore is in a subdirectory with Python files:

.
└── repository/
    ├── .git/
    ├── .gitignore  # Contains */*
    └── main.py

Expected behavior

Black should be able to detect and format the contents of subdir/ that are visible for Git (like subdir/main.py).

Environment

  • Black's version: main
  • OS and Python version: Linux/Python 3.8.10

TLDR

These steps help to visualize the problem:

  1. Setup the repository
mkdir demo && cd demo
git init
echo "var=[1,2,3]" > root.py
mkdir subdir
echo "var=[1,2,3]" > subdir/subdir.py
echo "*/*" > subdir/.gitignore
git add -A
git commit -m "First commit"
  1. Confirm that subdir/subdir.py is visible to Git:
echo "var=[3,4,5]" > subdir/subdir.py
git status
  1. Black reaches root.py but not subdir/subdir.py
black . --verbose

aaossa avatar Oct 07 '22 15:10 aaossa

Seems like the problem is related to how pathspec is currently used. Files in subdir/ are ignored by subdir/.gitignore, which is not the expected behavior. pathspec correctly determines and applies the pattern */*, but the .gitignore is incorrectly applied because it does not consider the reference path (where is the .gitignore located). There's currently no way to tell where does the rule come from:

https://github.com/psf/black/blob/75d5c0e3fbf5de67b995c80e12229b7525ff6bb9/src/black/files.py#L220-L223

I was able to check if the gitignore object returns the correct output for gitignore.match_tree(current_dir) and it does, meaning that it's not a bug in pathspec. Also, match_file does not include another argument to pass something like the current directory. I'll keep digging.

aaossa avatar Oct 11 '22 20:10 aaossa

Thank you for submitting the issue and for digging around 🙏 A contribution would be greatly appreciated if you're up for it!

felix-hilden avatar Oct 15 '22 18:10 felix-hilden

I managed to craft a fix for this. I'll submit a PR but I'm working on another issue at the moment. Here's a description of my current solution.

The gitignore match happens in gen_python_files. Instead of using a single PathSpect object as gitignore argument, use a dictionary: the path to the location of the gitignore file is the key, and the value is the PathSpec object. Thils will allow us to keep track of any gitignore file that we find while recursively yielding results from this function and match correctly each gitignore with the contents of the current directory:

diff --git a/src/black/files.py b/src/black/files.py
index ed503f5..8be8c48 100644
--- a/src/black/files.py
+++ b/src/black/files.py
@@ -198,7 +198,7 @@ def gen_python_files(
     extend_exclude: Optional[Pattern[str]],
     force_exclude: Optional[Pattern[str]],
     report: Report,
-    gitignore: Optional[PathSpec],
+    gitignore: Optional[Dict[Path, PathSpec]],
     *,
     verbose: bool,
     quiet: bool,
@@ -211,6 +211,17 @@ def gen_python_files(

     `report` is where output about exclusions goes.
     """
+
+    def is_ignored(gitignore_dict, child, report):
+        for _dir, _gitignore in gitignore_dict.items():
+            relative_path = normalize_path_maybe_ignore(child, _dir, report)
+            if relative_path is None:  # TODO: Confirm if this check is actually necessary
+                continue  # TODO: Confirm if its possible to return False right away
+            if _gitignore is not None and _gitignore.match_file(relative_path):
+                report.path_ignored(child, "matches the .gitignore file content")
+                return True
+        return False
+
     assert root.is_absolute(), f"INTERNAL ERROR: `root` must be absolute but is {root}"
     for child in paths:
         normalized_path = normalize_path_maybe_ignore(child, root, report)
@@ -218,8 +229,7 @@ def gen_python_files(
             continue

         # First ignore files matching .gitignore, if passed
-        if gitignore is not None and gitignore.match_file(normalized_path):
-            report.path_ignored(child, "matches the .gitignore file content")
+        if gitignore is not None and is_ignored(gitignore, child, report):
             continue

         # Then ignore with `--exclude` `--extend-exclude` and `--force-exclude` options.
@@ -244,6 +254,11 @@ def gen_python_files(
         if child.is_dir():
             # If gitignore is None, gitignore usage is disabled, while a Falsey
             # gitignore is when the directory doesn't have a .gitignore file.
+            if gitignore is None:
+                _gitignore = None
+            else:
+                _gitignore = gitignore.copy()
+                _gitignore[child] = get_gitignore(child)
             yield from gen_python_files(
                 child.iterdir(),
                 root,
@@ -252,7 +267,7 @@ def gen_python_files(
                 extend_exclude,
                 force_exclude,
                 report,
-                gitignore + get_gitignore(child) if gitignore is not None else None,
+                _gitignore,
                 verbose=verbose,
                 quiet=quiet,
             )

I've also written a couple of tests for this, that I'll include in the PR later.

aaossa avatar Oct 16 '22 05:10 aaossa

Sounds good 👍

felix-hilden avatar Oct 16 '22 08:10 felix-hilden