globby icon indicating copy to clipboard operation
globby copied to clipboard

Improve performance

Open sindresorhus opened this issue 8 years ago • 16 comments

This issue is dedicated to gathering ideas on how we can improve performance.

Some things that might help is to do filtering in the correct order to ensure minimal work is done, and cache and memoize as much as possible. The most important thing is that we skip ignored directories like node_modules as early as possible. It's the source of a lot of performance problems. The hard drive is the bottleneck, so we should ensure we interact with it as little as possible.

Some more ideas in #44 and mrmlnc/fast-glob#15

This Node.js feature request would also help a lot: https://github.com/nodejs/node/issues/15699

sindresorhus avatar Oct 22 '17 09:10 sindresorhus

FWIW, the biggest performance bottleneck in my testing on eleventy (for whatever reason) was the gitignore: true option. Removing this and just looping over .gitignore myself and adding the entries as ! lines was much faster, for whatever reason.

(Love globby—thanks!)

zachleat avatar Jan 12 '18 05:01 zachleat

About filtering out .gitignore patterns before globbing, is translating gitignore rules and adding them to fast-glob ignore the right direction to go?

yaodingyd avatar Jan 15 '19 15:01 yaodingyd

I'm adding this here as it's somewhat related: right now just requiring globby on my system often takes anywhere between 100ms and 250ms, and just installing it on a new directory downloads more than 4MB of stuff.

I've switched to tiny-glob in a project of mine, which takes less than 10ms to require on my system and install less than 100kb of stuff.

Perhaps some heavy dependencies could be removed/replaced.

fabiospampinato avatar Jan 27 '19 15:01 fabiospampinato

About filtering out .gitignore patterns before globbing, is translating gitignore rules and adding them to fast-glob ignore the right direction to go?

I think so, yes.

sindresorhus avatar Feb 15 '19 11:02 sindresorhus

I'm adding this here as it's somewhat related: right now just requiring globby on my system often takes anywhere between 100ms and 250ms, and just installing it on a new directory downloads more than 4MB of stuff.

Yeah, that is quite unfortunate. It's because of micromatch, a dependency of fast-glob. See: https://github.com/mrmlnc/fast-glob/issues/154

screen shot 2019-02-15 at 18 41 10

// @mrmlnc

sindresorhus avatar Feb 15 '19 11:02 sindresorhus

I concur with @zachleat that the gitignore: true has a huge perf cost for whatever reason.

On our codebase:

  1. using gitignore: true:

    ~8sec

    await globby([ `**` ], { gitignore: true });
    
  2. custom:

    ~100ms

    const ignored = fs.readFileSync(".gitignore", { encoding: "utf8" })
        .split("\n")
        .filter( line => !/^\s*$/.test(line) && !/^\s*#/.test(line))
        .map( line => "!" + line.trim().replace(/^\/+|\/+$/g, ""));
    
    const files = await globby([
        `**`,
        `!.git`,
        ...ignored
    ]);
    

~8000% difference is a lot...

dwelle avatar Feb 27 '19 15:02 dwelle

rollup is said to be good at removing dead code, I don't know if it goes as far as to repackage packages.

janat08 avatar May 01 '19 21:05 janat08

Just ran into this as well in our codebase. Commenting out gitignore leads to an improvement from 500 sec to 2 sec.

kangax avatar Mar 03 '20 18:03 kangax

For folks complaining about the performance hit of the Options.gitignore, my understanding of the current implementation is enabling that option adds a blocking preprocessing step that enumerates and parses potentially every .gitignore file in the directory tree regardless of the specified patterns value. It looks like Options.ignore is respected, but for many (perhaps most?) inputs to globby, the scope of this additional work may be vastly larger than what's done when gitignore is not specified.

Some of this additional work is unnecessary in some conditions, but I don't know offhand how difficult it is to determine that at runtime given the recursive nature of .gitignore and patterns.

Reference:

  • https://github.com/sindresorhus/globby/blob/b0d7330942a3d36a12056f3a1257f676a534b6ea/index.js#L104
  • https://github.com/sindresorhus/globby/blob/b0d7330942a3d36a12056f3a1257f676a534b6ea/ignore.js#L22
  • https://github.com/sindresorhus/globby/blob/b0d7330942a3d36a12056f3a1257f676a534b6ea/ignore.js#L65

DavidAnson avatar Apr 01 '24 04:04 DavidAnson

On our codebase:

  1. using gitignore: true: ~8sec
  2. custom: ~100ms ~8000% difference is a lot...

FYI, this is not an apples-to-apples comparison as the gitignore option for globby honors every .gitignore file in the tree while the custom code honors only the one at the root.

DavidAnson avatar Apr 01 '24 04:04 DavidAnson

For scenarios where only the root .gitignore file matters, setting Options.ignoreFiles to ".gitignore" should be very fast as it should avoid the (correct by specification!) ** part of the default behavior.

DavidAnson avatar Apr 01 '24 04:04 DavidAnson

By coincidence I have recently written a little benchmark comparing globby against tiny-readdir-glob-gitignore, here's the result:

bench

Basically as mentioned in the original post indeed skipping ignored directories as early as possible is significant. Additionally a much faster implementation than ignore is possible, and handling all .gitignore-like files found shouldn't be a problem. All in all when optimized enabling handling of .gitignore-like files should speed things up rather than slow things down in many real world scenarios, I think.

fabiospampinato avatar Apr 01 '24 11:04 fabiospampinato

By coincidence I have recently written …

@fabiospampinato Nice! I was just in the process of writing my own ideal glob function—based on fs.readdir and picomatch—before coming across tiny-readdir-glob[-gitignore]… So maybe now I don’t have to :)

But just have a couple of questions:

  1. How does performance compare to fdir?
  2. Why the need for tiny-readdir, rather than import {readdir} from 'node:fs/promises'?

danielbayley avatar Apr 02 '24 03:04 danielbayley

@danielbayley could you open an issue about that in the other repo? That seems unrelated to globby.

fabiospampinato avatar Apr 02 '24 10:04 fabiospampinato

@danielbayley could you open an issue about that in the other repo? That seems unrelated to globby.

@fabiospampinato Fair point. Sure, see https://github.com/fabiospampinato/tiny-readdir/issues/2 and https://github.com/fabiospampinato/tiny-readdir-glob/issues/2.

danielbayley avatar Apr 02 '24 15:04 danielbayley