Potentially the biggest performance improvement that can be done

Open SuperchupuDev opened this issue 1 year ago • 0 comments

I'm not even sure how to approach the problem, but implementing this means that all of the weird patterns that currently avoid all optimizations would be really optimized along with literally everything else.

By default, fdir crawls all subdirectories and files of a root, which can result in extra processing work that's not necessary, harming performance. tinyglobby tries to apply some optimizations by inferring a common root.

fdir exposes a exclude function that can be used to exclude directories from crawling. It's being currently used on the ignore patterns to... not crawl those ignored patterns?

What if, we took the matching patterns (basically the patterns that aren't meant to be ignored), we did some weird transformations to them, and used them in the exclude matcher?

For example, let's say we have the following usage:

import { glob } from 'tinyglobby';

await glob(['src/files/index.ts', 'scripts/*.ts']);

with the following file structure:

- node_modules
  ^big
- plugins
  - myPlugins
    | plugin.ts
- scripts
  - utils
    | index.ts
  | deploy.ts
  | run.ts
- src
  - files
    | index.ts
  - other
    | index.ts

Basically, we need a picomatch matcher that returns true for every directory that we don't want to crawl, in this case node_modules, plugins, scripts/utils, and src/other. This could be implemented with the following picomatch usage:

import picomatch from 'picomatch';

const exclude = picomatch('**/*', {
  ignore: ['src/files', 'scripts/*/**']
});

Great! It now only crawls the directories needed (hopefully, I haven't checked). Now the question is how to implement something that converts ['src/files/index.ts', 'scripts/*.ts' into ['src/files', 'scripts/*/**'], which is the whole point of this issue. If we figure it out, tinyglobby should be nearly as fast as possible.

Some notes:

Whatever thing is implemented needs to take into account how <pattern>/** patterns work. It not only matches subdirectories of <pattern>, but it also matches <pattern> itself. this is why the second processed pattern in the example isn't scripts/**, as that would match scripts making fdir not crawl it.
Ideally this processing should happen in the normalizePattern function, so that we don't need to loop through the patterns extra times.

Sep 28 '24 14:09 SuperchupuDev