fast-glob icon indicating copy to clipboard operation
fast-glob copied to clipboard

Feature request: dynamically ignore files or directories

Open dflupu opened this issue 4 years ago • 3 comments

It would be useful if the user would be able to pass a callback to fast-glob that, when called with paths to files or directories, returns whether or not fast-glob should include them in the results. If the callback returns false for a directory, then the directory is not iterated at all.

One use case for this would be would be improving globby performance when the gitignore option is enabled. Currently, this option works by globbing the given path twice: once with a **/.gitignore pattern to find and read the gitignore files, and again with the users' globs. While this could be improved upon, I cannot see a way that both a) avoids the initial glob for gitignore files and b) makes fast-glob not iterate gitignored directories. Adding a callback would make that possible.

If I am not aware of some relevant fast-glob feature, let me know. Globby issue: https://github.com/sindresorhus/globby/issues/50

dflupu avatar Mar 11 '20 11:03 dflupu

Hello, @dflupu,

Thank you for interesting question :tada:

First, I think that the hook mechanism will help you here. Unfortunately, this mechanism requires a lot of effort and will not appear in the near future (~1 year, i'll create an issue). The problem is that this mechanism will slow down the directory tree crawl. But even in this case, I don't see any ways to add patterns to primary filters on the fly — only in hooks.

Second, I see that right now you are doing a primary crawl of the directory tree to search for files (.gitignore). In this case, you can use @nodelib/fs.walk instead of fast-glob. The @nodelib/fs.walk two or three times faster because it does not have any filter by default. In this case, the first crawl should be several times faster and you can add patterns to the filter on the fly (there may be problems with async/stream — this is a really asynchronous).

Additional question:

Why do you need to consider all .gitignore files? Why not just consider the root file? Big monorepo? Do you have any real use case?

mrmlnc avatar Mar 13 '20 05:03 mrmlnc

We are affected by the globby performance issue described by @dflupu. We have a monorepo (using rush+pnpm) with a big .gitignore in the repository root plus some smaller .gitignore files in some of the projects. We had a globby-based script that takes 60sec to find files. I just rewrote it to perform the directory crawling manually with fs.readdir and now it runs in 2sec. I guess it could be faster if I could use fast-glob instead of my clumsy home-grown script :)

Toxaris avatar Dec 18 '20 01:12 Toxaris

I just hit a scenario where my ignore globs are slowing things down significantly, most of the time is actually spent executing them, and if I just merge them into one it's still slow because they seem to be expanded anyway.

If a raw callback were allowed I could just write like /[\\\/](\.git|node_modules)$/.test(targetPath) or similar, which should be more or less free compared to the current regexes, I think.

fabiospampinato avatar Nov 21 '23 20:11 fabiospampinato