glob icon indicating copy to clipboard operation
glob copied to clipboard

'*' broken in Pattern::matches_path

Open jethrogb opened this issue 9 years ago • 5 comments

Here's a recursive file list:

$find src
src/
src/main.rs
src/a/
src/a/b.rs

Running this program

extern crate glob;
use std::path::Path;
use glob::{Pattern,glob};

fn main() {
	let pattern="src/*.rs";

	println!("matches src/a/b.rs: {}",Pattern::new(pattern).unwrap().matches_path(Path::new("src/a/b.rs")));
	
	for entry in glob(pattern).unwrap() {
		println!("globbed: {}", entry.unwrap().display());
	}
}

outputs

matches src/a/b.rs: true
globbed: src/main.rs

Note it DID NOT output globbed: src/a/b.rs. Meaning glob and Pattern::matches_path are inconsistent here. I think glob is correct as '*' should not match across slashes.

jethrogb avatar Oct 31 '16 07:10 jethrogb

I think glob is correct as '*' should not match across slashes.

This is configurable. Have you seen the MatchOptions struct?

It does look like there's some interesting stuff going on:

  • glob::glob calls glob::glob_with, whose documentation states: "The options given are passed through unchanged to Pattern::matches_with(..) with the exception that require_literal_separator is always set to true regardless of the value passed to this function."
  • The default value of MatchOptions has require_literal_separator disabled, which is what's used when you call Pattern::matches_path.

I think the above two things at least explain the behavior you're seeing, and since it's documented, I guess it's intended behavior?

One other interesting thing to note is that MatchOptions::new() and MatchOptions::default() will seem to produce different values?

Either way, I don't think * is broken.

BurntSushi avatar Oct 31 '16 10:10 BurntSushi

I do see now that it's technically documented. But you have to read in various places. Note that both glob and matches/matches_path just say they use MatchOptions::new(). Regardless of what the docs say, I think this is extremely confusing default behavior for a globbing library.

@alexcrichton did you intend cargo to use require_literal_separator: false for package.include/package.exclude filters?

jethrogb avatar Oct 31 '16 16:10 jethrogb

Regardless of what the docs say, I think this is extremely confusing default behavior for a globbing library.

I'm not so sure about that. For example, if I write *.rs, then I'd probably expect that to match src/foo/bar.rs.

(Interestingly, man gitignore specifies that * in *.rs will match a / but the * in src/*.rs won't. In other words, the behavior of * changes based on whether there's a literal path separator in the glob.)

I do agree that the current docs are confusing, and especially that new() and default() returning different values is confusing.

BurntSushi avatar Oct 31 '16 18:10 BurntSushi

I suppose the current behavior is also the default behavior of glob vs. fnmatch.

jethrogb avatar Oct 31 '16 19:10 jethrogb

Ran into this as well. From https://man7.org/linux/man-pages/man7/glob.7.html :

Globbing is applied on each of the components of a pathname separately. A '/' in a pathname cannot be matched by a '?' or '*' wildcard, or by a range like "[.-0]".

Personally, I find this way more sensible. I think this should be made the default (which would be a breaking change though). Either way, the current situation should be documented more clearly as it deviates from what one might expect.

piegamesde avatar Nov 12 '21 11:11 piegamesde