Wax seems to traverse into `not` excluded directories anyway
Hi! First of all, thanks for the nice library, with an especially nice API!
This might be a bug in wax, or it could be that I'm just holding it wrong. But we are currently making heavy use of wax within the package manager we've been building: https://github.com/prefix-dev/pixi. And I've noticed that wax seems to traverse into hidden directories even if included in the not (so should be pruned before). For example if you have something like:
/// The same glob patterns used in the current main program.
pub const PATTERNS: &[&str] = &[
"**/*.{c,cc,cxx,cpp,h,hpp,hxx}",
"**/*.{cmake,cmake.in}",
"**/CMakeFiles.txt",
];
/// Collect matches using the `wax` crate, excluding all hidden entries
/// (any path component starting with a dot), e.g., `.pixi`, `.git`, `.env`.
pub fn collect_with_wax(
root: &Path,
patterns: &[&str],
) -> Result<Vec<PathBuf>, Box<dyn std::error::Error>> {
let mut results = Vec::new();
for pat in patterns {
let glob = wax::Glob::new(pat)?;
let iter = glob
.walk(root)
// Exclude hidden directories and their descendants; exhaustive pattern enables pruning.
.not(["**/.*/**"])
.unwrap()
.filter_map(|e| e.ok());
for entry in iter {
// Skip directories; focus on file-like entries.
if entry.file_type().is_dir() {
continue;
}
results.push(entry.path().to_path_buf());
}
}
Ok(results)
}
/// Later on call
collect_with_wax(Path::new("."), PATTERNS).unwrap()
With big hidden folders in the root it seems the runtime of wax seems to become bigger, although the actual matches are never found in the hidden folders.
I created a small reproducer here: https://github.com/tdejager/wax-vs-ignore. This needs pixi mainly to populate the .pixi folder with lots of files. But can also just be run with cargo bench.
I also ran samply a number of times and saw the most amount of time was on the iteration within walk and specifically captures from regex taking a long time. I can only see that being used here: https://github.com/olson-sean-k/wax/blob/46d690b283329e1059ec8149aaa8ab41864cf101/src/walk/glob.rs#L311-L345
These are both used in branches, that from reading the code I assume should not be touched when the tree is filtered out.
Ah just tried with the main branch and it does a lot better for wax. So maybe a release would be good anyways :). Still somewhat less performance when compared with ignore for this use-case though.