wax icon indicating copy to clipboard operation
wax copied to clipboard

Wax seems to traverse into `not` excluded directories anyway

Open tdejager opened this issue 3 months ago • 1 comments

Hi! First of all, thanks for the nice library, with an especially nice API!

This might be a bug in wax, or it could be that I'm just holding it wrong. But we are currently making heavy use of wax within the package manager we've been building: https://github.com/prefix-dev/pixi. And I've noticed that wax seems to traverse into hidden directories even if included in the not (so should be pruned before). For example if you have something like:

/// The same glob patterns used in the current main program.
pub const PATTERNS: &[&str] = &[
    "**/*.{c,cc,cxx,cpp,h,hpp,hxx}",
    "**/*.{cmake,cmake.in}",
    "**/CMakeFiles.txt",
];

/// Collect matches using the `wax` crate, excluding all hidden entries
/// (any path component starting with a dot), e.g., `.pixi`, `.git`, `.env`.
pub fn collect_with_wax(
    root: &Path,
    patterns: &[&str],
) -> Result<Vec<PathBuf>, Box<dyn std::error::Error>> {
    let mut results = Vec::new();
    for pat in patterns {
        let glob = wax::Glob::new(pat)?;
        let iter = glob
            .walk(root)
            // Exclude hidden directories and their descendants; exhaustive pattern enables pruning.
            .not(["**/.*/**"])
            .unwrap()
            .filter_map(|e| e.ok());

        for entry in iter {
            // Skip directories; focus on file-like entries.
            if entry.file_type().is_dir() {
                continue;
            }
            results.push(entry.path().to_path_buf());
        }
    }
    Ok(results)
}
/// Later on call

collect_with_wax(Path::new("."), PATTERNS).unwrap()

With big hidden folders in the root it seems the runtime of wax seems to become bigger, although the actual matches are never found in the hidden folders.

I created a small reproducer here: https://github.com/tdejager/wax-vs-ignore. This needs pixi mainly to populate the .pixi folder with lots of files. But can also just be run with cargo bench.


I also ran samply a number of times and saw the most amount of time was on the iteration within walk and specifically captures from regex taking a long time. I can only see that being used here: https://github.com/olson-sean-k/wax/blob/46d690b283329e1059ec8149aaa8ab41864cf101/src/walk/glob.rs#L311-L345

These are both used in branches, that from reading the code I assume should not be touched when the tree is filtered out.

tdejager avatar Sep 13 '25 12:09 tdejager

Ah just tried with the main branch and it does a lot better for wax. So maybe a release would be good anyways :). Still somewhat less performance when compared with ignore for this use-case though.

Image

tdejager avatar Sep 15 '25 09:09 tdejager