🐛 File Scanner taking upwards of 5 seconds
Environment information
CLI:
Version: 2.0.0-beta.1
Color support: true
Platform:
CPU Architecture: x86_64
OS: linux
Environment:
BIOME_LOG_PATH: unset
BIOME_LOG_PREFIX_NAME: unset
BIOME_CONFIG_PATH: unset
NO_COLOR: unset
TERM: foot
JS_RUNTIME_VERSION: v22.11.0
JS_RUNTIME_NAME: node
NODE_PACKAGE_MANAGER: yarn/4.6.0
Biome Configuration:
Status: Loaded successfully
Path: /home/james/Dev/Create/escher-wt/biome/biome.jsonc
Formatter enabled: true
Linter enabled: true
Assist enabled: true
VCS enabled: true
Workspace:
Open Documents: 0
What happened?
We've experienced pretty poor performance with the added file scanner on the v2 beta. Just running a format now takes upwards of 5 secs to first perform scanning.
I'll admit, we have a lot of barrel files, and probably some circular imports, so not sure if that might negatively impact performance.
Our monorepo has ~3500 files reported as being checked.
Expected result
I was a bit surprised that the file scanner was needed at all for formatting, but if it is, ideally it would not take 5 secs to scan our project. Not sure if there's other options here, like if running the daemon would make it so this scanning cost only had to be paid on startup once.
Code of Conduct
- [x] I agree to follow Biome's Code of Conduct
Hello @jpulec, please provide a minimal reproduction. You can use one of the following options:
- Provide a link to our playground, if it's applicable.
- Provide a link to GitHub repository. To easily create a reproduction, you can use our interactive CLI via
npm create @biomejs/biome-reproduction
Issues marked with S-Needs repro will be closed if they have no activity within 3 days.
Thanks! That seems definitely on the high side, I agree...
Unfortunately it does seem like we will need the file scanner even for formatting, although maybe we can optimise it to use a "light scan" in such a case? The reason we'll need it is for discovery of settings in monorepos, but that may not necessarily require a full scan.
That said, is it possible your repository contains a lot of files beyond those being checked? I'm thinking maybe files in a build or .output directory that might be getting picked up.
Do you think there's any part of the repository that you could share with us so might understand where the time is being spent?
I also thought about disabling the scanner for formatting, but we can't disable all of it.
We also need it for nested ignore files
Do we have any idea why it takes so long to scan a project? I find it strange that both formatting and linting takes less time than scanning.
The scanner scans all files, node_modules too, I think
When I run format with --verbose, it's all files that I expect to be processed, ~3500. But is the scanner looking at more files than that?
Yes, the scanner currently goes through everything, minus a few hardcoded exceptions, but notably including node_modules/. The reason it scans node_modules/ is because it needs to know which libraries you have installed and what symbols they export.
Of course, those things aren’t as useful during formatting, which is why a “light scan” might make sense in that case.
Got it. Couple more stats for our repo... we have a pretty large node_modules, about 4GB of files, and another 1.3GB of zip files in .yarn/cache. There's about ~178k files in the repo between our application and node_modules.
While a real reproduction would still be helpful, I’m reopening this to keep the conversation going. We’re going to have a new preview out soon, and I think it might be useful to do some more definitive tests before we release 2.0.
@jpulec Can you try what the performance is with the latest prerelease? We made some tweaks in ignoring certain folders during the scanning, as well as a small tweak that might improve the locking performance when scanning many small files.
You can install it using the following command:
npm i https://pkg.pr.new/biomejs/biome/@biomejs/biome@2d699e3
Note I opened a task for improving scanner performance: #5636
I tried v2.0.0-beta.6 and I'm still experiencing huge performance issues, which means I'm seeing multi-second fixes whether it's 1 file or 4000 files fixed.
I'm using it via pre-commit.com (which passes the filename to biome) but running it directly has the same effect.
Biome 2.0.0.beta-6
$ git commit -a
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook
Checked 1 file in 6s. Fixed 1 file.
$ pre-commit run --all-files biome-check
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook
Checked 1931 files in 8s. Fixed 1 file.
Checked 1484 files in 6s. No fixes applied.
Biome v1.9.4
$ git commit -a
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook
Checked 1 file in 14ms. Fixed 1 file.
$ pre-commit run --all-files biome-check
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook
Checked 1930 files in 309ms. Fixed 1 file.
Checked 1484 files in 239ms. No fixes applied.
Context
Number of package.json files in my monorepo (outside node_modules):
$ fd package.json | wc -l
57
Files in node_modules:
$ find node_modules -type f | wc -l
185849
Config for v2
Unfortunately the monorepo itself is private
{
"$schema": "./node_modules/@biomejs/biome/configuration_schema.json",
"vcs": {
"enabled": true,
"clientKind": "git",
"useIgnoreFile": true
},
"files": {
"includes": [
"**",
"!**/package.json",
"!**/.vscode",
"!**/libs/config-eslint/rules/no-restricted-syntax.test.tsx"
]
},
"javascript": {
"jsxRuntime": "reactClassic"
},
"formatter": {
"indentStyle": "space"
},
"assist": {
"actions": {
"source": {
"organizeImports": "off"
}
}
},
"linter": {
"domains": {
"react": "none",
"test": "none"
},
"rules": {
"recommended": false,
"correctness": {
"useImportExtensions": "error"
}
"style": {
"useNodejsImportProtocol": "error",
"useExportType": "error",
"useImportType": "error"
}
}
},
"overrides": [
{
"includes": ["**/tsconfig.*.json", "**/project.json"],
"json": {
"parser": {
"allowComments": true,
"allowTrailingCommas": true
}
}
}
]
}
@fregante Can you try disabling useImportExtensions? That rule belongs to the project domain, which triggers indexing of files.
You may also want to subscribe to https://github.com/biomejs/biome/issues/6234, which can hopefully help specifically for the use case of commit hooks.
Can you try disabling
useImportExtensions?
Yes without it it's only 141 times slower than biome v1 rather than 430 times.
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook
Checked 2 files in 1977ms. Fixed 1 file.
Issues keep getting closed but I see no progress here. If anything, it's gotten slower
Biome 2.0.6
$ git commit -a
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook
Checked 5 files in 9s. Fixed 1 file.
No need, we already have a task for it
https://github.com/biomejs/biome/issues/6234