beachball icon indicating copy to clipboard operation
beachball copied to clipboard

Leverage `micromatch` to speed up glob matching

Open mastrzyz opened this issue 1 year ago • 3 comments

micromatch is faster than minimatch in performing matching.

Which can be seen with auditable benchmarks here

I wanted to be 100% sure this would help this project and created two benchmark files of my own here trying to replicate some of the usecases of this project.

The results are :

marcinstrzyz@Marcins-MacBook-Pro-2 lib % node benchmarkLarge.js
Fastest is micromatch
┌─────────┬──────────────┬────────────────┬───────────────────────┬────────────────────────────┬──────────────────────────┐
│ (index) │   Function   │ Mean Time (ms) │ Operations per Second │ Standard Error of the Mean │ Relative Margin of Error │
├─────────┼──────────────┼────────────────┼───────────────────────┼────────────────────────────┼──────────────────────────┤
│    0    │ 'micromatch' │    '23.75'     │        '42.10'        │           '0.00'           │          '4.11'          │
│    1    │ 'minimatch'  │    '71.48'     │        '13.99'        │           '0.00'           │          '9.11'          │
└─────────┴──────────────┴────────────────┴───────────────────────┴────────────────────────────┴──────────────────────────┘
marcinstrzyz@Marcins-MacBook-Pro-2 lib % node benchmark.js     
Fastest is micromatch
┌─────────┬──────────────┬────────────────┬───────────────────────┬────────────────────────────┬──────────────────────────┐
│ (index) │   Function   │ Mean Time (ms) │ Operations per Second │ Standard Error of the Mean │ Relative Margin of Error │
├─────────┼──────────────┼────────────────┼───────────────────────┼────────────────────────────┼──────────────────────────┤
│    0    │ 'micromatch' │     '0.03'     │      '37700.76'       │           '0.00'           │          '3.83'          │
│    1    │ 'minimatch'  │     '0.06'     │      '15809.91'       │           '0.00'           │          '6.07'          │
└─────────┴──────────────┴────────────────┴───────────────────────┴────────────────────────────┴──────────────────────────┘

Although the perceived speedup may be only seen in very large cases, we do see it is 50% improvement overall.

The current UT's work fine, happy to extend them if we have some worries about known edge cases or we just want more security.

mastrzyz avatar Aug 21 '23 16:08 mastrzyz

@kenotron thoughts on this? wondering if Lage could benefit it further downstream as well.

mastrzyz avatar Aug 21 '23 16:08 mastrzyz

@ecraig12345 ?

mastrzyz avatar Aug 29 '23 22:08 mastrzyz

My big concern with making this change in a non-major version is the difference in backslash handling. A lot of our internal teams use Windows, so it wouldn't be surprising if somebody is using backslashes as path separators in patterns, and it's hard to accurately detect and fix that.

ecraig12345 avatar Sep 02 '23 01:09 ecraig12345