tiny-glob
tiny-glob copied to clipboard
Some files are missing when doing glob
I have this code:
const searchPattern = '{post,page}/**/*.{md}'
glob(searchPattern, {
cwd: content,
absolute: true
})
.then(files => {
console.log(files.length)
callback(null, files)
})
.catch(err => callback(err))
And I would expect the files for {post,page}/**/*.{md} = {post}/**/*.{md} + {page}/**/*.{md}, but it's not in my case.
I have 2 files for {page}/**/*.{md}, and 53 files for {post}/**/*.{md}, but only 33 for {post,page}/**/*.{md}. Am I doing something wrong here? The searchPattern just works fine under node-glob, fast-glob.
Here's the one causing problem https://github.com/terkelg/tiny-glob/blob/master/index.js#L34, I can remove it, and everything works as expected in my case. Also, it won't fail any tests.
I just put up a failing test here https://github.com/chenxsan/tiny-glob/commit/18f54bc003973335915f333ec0ab841ef2c82c71
Hi Sam! Thanks for having a look at the issue. Does it effect the benchmarks when you remove that line? It can also be a problem with the regex coming from globrex. Can you print the regex and file and test them?
The idea is that every path segment (aka dir name) is checked before tiny-glob starts crawling that directory. When globrex convert a glob it also break it into smaller regex segments for each folder/path segment. This is done so tiny-glob can check each directory and avoid spending time crawling unnecessary folders that never will result in any matches anyway. I suspect the regex for the glob {post,page} could be wrong.
Here's benchmark result after I removed that line:
glob x 13,438 ops/sec ±3.05% (83 runs sampled)
fast-glob x 25,485 ops/sec ±5.20% (76 runs sampled)
tiny-glob x 55,162 ops/sec ±6.97% (55 runs sampled)
Fastest is tiny-glob
┌───────────┬─────────────────────────┬────────────┬────────────────┐
│ Name │ Mean time │ Ops/sec │ Diff │
├───────────┼─────────────────────────┼────────────┼────────────────┤
│ glob │ 0.00007441320916659413 │ 13,438.474 │ N/A │
├───────────┼─────────────────────────┼────────────┼────────────────┤
│ fast-glob │ 0.00003923935167426461 │ 25,484.621 │ 89.64% faster │
├───────────┼─────────────────────────┼────────────┼────────────────┤
│ tiny-glob │ 0.000018128419620750812 │ 55,162.006 │ 116.45% faster │
└───────────┴─────────────────────────┴────────────┴────────────────┘
Here's the lexer variable:
{ regex: /^(post|page)\/((?:[^\/]*(?:\/|$))*)([^\/]*)\.md$/,
segments: [ /^(post|page)$/, /^((?:[^\/]*(?:\/|$))*)$/, /^([^\/]*)\.md$/ ],
globstar: '/^((?:[^\\/]*(?:\\/|$))*)$/' }
And part of my directory structure:
$ tree ./post
All those .md files right under post are included while others inside subdirectory of post are filtered out.
So I just added a console.log(rgx, file) right before if (rgx && !rgx.test(file)) continue;, here's the printed result:
/^(post|page)$/ 'draft'
/^(post|page)$/ 'page'
/^((?:[^\/]*(?:\/|$))*)$/ 'about'
/^(post|page)$/ 'post'
/^([^\/]*)\.md$/ 'firefox-os'
/^([^\/]*)\.md$/ 'github-pages-custom-domain'
/^([^\/]*)\.md$/ 'markdown-and-table'
/^([^\/]*)\.md$/ 'srcset and sizes'
/^([^\/]*)\.md$/ 'telegram-scam-bitcoin'
...
...
Those are folders right under post, they should be walked into too. So the problem here might origin from the level value?
What's going on with the directory names? Can you post the non-escaped strings for some of them?
Those're chinese characters.
But I don't think it matters. Folders with english names like firefox-os are ignored by glob too.
@terkelg Thought it's really difficult for you to understand my situation, I just created a new repo here https://github.com/chenxsan/tiny-glob-demo.
Please run npm install to install deps then run node index.js to check the results.
Thanks a lot @chenxsan. I'll have a look at this when I get some spare time. I appreciate the help and information you provided
Its possible that i have the same problem, but in different form.
I prepared repo with test case: git clone https://github.com/pavelloz/tg-testcase && npm i && node test.js
https://github.com/pavelloz/tg-testcase/
Shortcut:
Structure:
tinyglob-testcase|master ⇒ tree modules
modules
└── test
├── private
│ └── views
│ └── pages
│ └── mypage.liquid
└── public
└── views
├── pages
│ └── page.liquid
└── partials
├── data
│ ├── one.liquid
│ └── two.json
└── hello.liquid
9 directories, 5 files
Code:
const tg = require('tiny-glob');
tg('**', {
cwd: 'modules/test',
filesOnly: true
}).then(files => {
console.log('Non-filtered.', files.length);
console.log(files);
});
tg('{private,public}/**', {
cwd: 'modules/test',
filesOnly: true
}).then(files => {
console.log('Filtered by private/public (broken)', files.length);
console.log(files);
});
tg('**/{private,public}/**', {
cwd: 'modules/test',
filesOnly: true
}).then(files => {
console.log('Filtered, with workaround/hack applied.', files.length);
console.log(files);
});
Results
tinyglob-testcase|master ⇒ node test.js
Filtered by private/public (broken) 1
[ 'private/views/pages/mypage.liquid' ]
Non-filtered. 5
[
'private/views/pages/mypage.liquid',
'public/views/pages/page.liquid',
'public/views/partials/data/one.liquid',
'public/views/partials/data/two.json',
'public/views/partials/hello.liquid'
]
Filtered, with workaround/hack applied. 5
[
'private/views/pages/mypage.liquid',
'public/views/pages/page.liquid',
'public/views/partials/data/one.liquid',
'public/views/partials/data/two.json',
'public/views/partials/hello.liquid'
]