fast-glob
fast-glob copied to clipboard
Feature request: dynamically ignore files or directories
It would be useful if the user would be able to pass a callback to fast-glob that, when called with paths to files or directories, returns whether or not fast-glob should include them in the results. If the callback returns false for a directory, then the directory is not iterated at all.
One use case for this would be would be improving globby performance when the gitignore option is enabled. Currently, this option works by globbing the given path twice: once with a **/.gitignore
pattern to find and read the gitignore files, and again with the users' globs. While this could be improved upon, I cannot see a way that both a) avoids the initial glob for gitignore files and b) makes fast-glob not iterate gitignored directories. Adding a callback would make that possible.
If I am not aware of some relevant fast-glob feature, let me know. Globby issue: https://github.com/sindresorhus/globby/issues/50
Hello, @dflupu,
Thank you for interesting question :tada:
First, I think that the hook mechanism will help you here. Unfortunately, this mechanism requires a lot of effort and will not appear in the near future (~1 year, i'll create an issue). The problem is that this mechanism will slow down the directory tree crawl. But even in this case, I don't see any ways to add patterns to primary filters on the fly — only in hooks.
Second, I see that right now you are doing a primary crawl of the directory tree to search for files (.gitignore
). In this case, you can use @nodelib/fs.walk
instead of fast-glob
. The @nodelib/fs.walk
two or three times faster because it does not have any filter by default. In this case, the first crawl should be several times faster and you can add patterns to the filter on the fly (there may be problems with async/stream — this is a really asynchronous).
Additional question:
Why do you need to consider all
.gitignore
files? Why not just consider the root file? Big monorepo? Do you have any real use case?
We are affected by the globby performance issue described by @dflupu. We have a monorepo (using rush+pnpm) with a big .gitignore
in the repository root plus some smaller .gitignore
files in some of the projects. We had a globby
-based script that takes 60sec to find files. I just rewrote it to perform the directory crawling manually with fs.readdir
and now it runs in 2sec. I guess it could be faster if I could use fast-glob
instead of my clumsy home-grown script :)
I just hit a scenario where my ignore globs are slowing things down significantly, most of the time is actually spent executing them, and if I just merge them into one it's still slow because they seem to be expanded anyway.
If a raw callback were allowed I could just write like /[\\\/](\.git|node_modules)$/.test(targetPath)
or similar, which should be more or less free compared to the current regexes, I think.