ripgrep icon indicating copy to clipboard operation
ripgrep copied to clipboard

`ignore`: gitignore relative paths are not treated relatively

Open SergioBenitez opened this issue 3 years ago • 6 comments

In short, gitignore rules that should be relative to a specific gitgnore path are instead reinterpreted as **/rule. Here's the failing test case:

use std::path::PathBuf;
use ignore::gitignore::GitignoreBuilder;

fn main() {
    let mut builder = GitignoreBuilder::new("/root");
    builder.add_line(Some(PathBuf::from("/root/scratch/.gitignore")), "!foo.html").unwrap();
    let matcher = dbg!(builder.build().unwrap());

    assert!(matcher.matched("/root/scratch/foo.html", false).is_whitelist());
    assert!(dbg!(matcher.matched("/root/unknown/foo.html", false)).is_none());
}

The relevant output:

[src/main.rs:7] builder.build().unwrap() = Gitignore {
    root: "/root",
    globs: [
        Glob {
            from: Some(
                "/root/scratch/.gitignore",
            ),
            original: "!foo.html",
            actual: "**/foo.html",
            is_whitelist: true,
            is_only_dir: false,
        },
    ],
    num_ignores: 0,
    num_whitelists: 1,
}
[src/main.rs:10] matcher.matched("/root/unknown/foo.html", false) = Whitelist(
    Glob {
        from: Some(
            "/root/scratch/.gitignore",
        ),
        original: "!foo.html",
        actual: "**/foo.html",
        is_whitelist: true,
        is_only_dir: false,
    },
)
thread 'main' panicked at 'assertion failed: dbg!(matcher.matched(\"/root/unknown/foo.html\", false)).is_none()', src/main.rs:10:5

SergioBenitez avatar Jun 23 '21 19:06 SergioBenitez

I believe these are the relevant gitignore docs:

If there is a separator at the beginning or middle (or both) of the pattern, then the pattern is relative to the directory level of the particular .gitignore file itself. Otherwise the pattern may also match at any level below the .gitignore level.

In particular, the "Otherwise ..." language applies here since there is no / in your pattern. The issue here is that it should only apply to any level below that .gitignore file, but here, it is also being applied (incorrectly) to a sibling directory. However, the pattern foo.html absolutely should match at any directory beneath the point in which it was defined. So the **/foo.html construction is correct... to a point.

This is very likely an API-level bug. In particular, when building the crate, I likely assumed that a gitignore matcher would only be used on paths at or below the directory in which it was created. And generally speaking, that's how the ignore file traversal works. Fixing the gitignore matcher to handle this case correctly will likely incur an unavoidable additional cost (I think) to file traversal, and thus, some new API mechanism needs to exist.

I don't see this bug getting fixed any time soon unfortunately. The ignore crate desperately needs a very very careful rewrite. (And has needed it for a long time.)

BurntSushi avatar Jun 23 '21 22:06 BurntSushi

I believe these are the relevant gitignore docs:

If there is a separator at the beginning or middle (or both) of the pattern, then the pattern is relative to the directory level of the particular .gitignore file itself. Otherwise the pattern may also match at any level below the .gitignore level.

In particular, the "Otherwise ..." language applies here since there is no / in your pattern. The issue here is that it should only apply to any level below that .gitignore file, but here, it is also being applied (incorrectly) to a sibling directory. However, the pattern foo.html absolutely should match at any directory beneath the point in which it was defined. So the **/foo.html construction is correct... to a point.

I suppose it could depend on what **/foo.html means to ignore. Unfortunately it doesn't appear to mean anything with correct semantics as the following also doesn't work:

use std::path::PathBuf;
use ignore::gitignore::GitignoreBuilder;

fn main() {
    let mut builder = GitignoreBuilder::new("/root/scratch");
    builder.add_line(Some(PathBuf::from("/root/scratch/.gitignore")), "!/**/*.html").unwrap();
    // builder.add_line(None, "!/**/*.html").unwrap(); // Using `None` doesn't help, unfortunately.

    let matcher = dbg!(builder.build().unwrap());

    assert!(matcher.matched("/root/scratch/foo.html", false).is_whitelist());
    assert!(dbg!(matcher.matched("/root/unknown/foo.html", false)).is_none());
}

Here, we see:

[src/main.rs:7] builder.build().unwrap() = Gitignore {
    set: GlobSet {
    root: "/root/scratch",
    globs: [
        Glob {
            from: Some(
                "/root/scratch/.gitignore",
            ),
            original: "!/**/*.html",
            actual: "**/*.html",
            is_whitelist: true,
            is_only_dir: false,
        },
    ],
}
[src/main.rs:10] matcher.matched("/root/unknown/foo.html", false) = Whitelist(
    Glob {
        from: Some(
            "/root/scratch/.gitignore",
        ),
        original: "!/**/*.html",
        actual: "**/*.html",
        is_whitelist: true,
        is_only_dir: false,
    },
)
thread 'main' panicked at 'assertion failed: dbg!(matcher.matched(\"/root/unknown/foo.html\", false)).is_none()', src/main.rs:10:5

I don't see this bug getting fixed any time soon unfortunately. The ignore crate desperately needs a very very careful rewrite. (And has needed it for a long time.)

I understand. I'm working on a replacement for the core gitignore pattern matching part of ignore using GlobSet directly. Once it's complete, I imagine it wouldn't be too difficult to swap in this new library in ignore.

SergioBenitez avatar Jun 24 '21 04:06 SergioBenitez

@SergioBenitez Yeah, that looks like the same variant of the bug that is "the gitignore matcher has an undocumented and in some cases inconvenient assumption that you only use it with paths at or below the directory in which the gitignore rules were defined."

BurntSushi avatar Jun 24 '21 11:06 BurntSushi

@BurntSushi : is there some way to group the issues related to the ignore crate ? or perhaps an issue, which describes the various points that need to be clarified ?

I am aware of #278 for example, but searching the issue tracker for "ignore" brings along a whole lot of unrelated issues.

LeGEC avatar Oct 22 '21 06:10 LeGEC

@LeGEC No, such a grouping doesn't exist. You're right that it should, but it will take work and effort to do.

BurntSushi avatar Oct 22 '21 12:10 BurntSushi

I hit to this bug when I was developing something with the ignore crate.

mydir/.gitignore has the entry mydir/myfile.txt. matches returns Match::Ignore on mydir/myfile.txt, I believe the correct ignore should be mydir/mydir/myfile.txt, and mydir/myfile.txt should return a None.

I can submit a PR to add directory to the pattern. Currently lines in mydir/.gitignore are interpreted as if they are in ./.gitignore, and if a / is in the pattern, mydir/myfile.txt becomes **/mydir/myfile.txt which matches all parent dirs. Instead, it should add mydir/mydir/myfile.txt when ignore::add_file("mydir/.gitignore") is used in GitignoreBuilder.

iesahin avatar Apr 29 '22 12:04 iesahin