biome icon indicating copy to clipboard operation
biome copied to clipboard

🐛 File Scanner taking upwards of 5 seconds

Open jpulec opened this issue 9 months ago • 11 comments

Environment information

CLI:
  Version:                      2.0.0-beta.1
  Color support:                true

Platform:
  CPU Architecture:             x86_64
  OS:                           linux

Environment:
  BIOME_LOG_PATH:               unset
  BIOME_LOG_PREFIX_NAME:        unset
  BIOME_CONFIG_PATH:            unset
  NO_COLOR:                     unset
  TERM:                         foot
  JS_RUNTIME_VERSION:           v22.11.0
  JS_RUNTIME_NAME:              node
  NODE_PACKAGE_MANAGER:         yarn/4.6.0

Biome Configuration:
  Status:                       Loaded successfully
  Path:                         /home/james/Dev/Create/escher-wt/biome/biome.jsonc
  Formatter enabled:            true
  Linter enabled:               true
  Assist enabled:               true
  VCS enabled:                  true

Workspace:
  Open Documents:               0

What happened?

We've experienced pretty poor performance with the added file scanner on the v2 beta. Just running a format now takes upwards of 5 secs to first perform scanning.

I'll admit, we have a lot of barrel files, and probably some circular imports, so not sure if that might negatively impact performance.

Our monorepo has ~3500 files reported as being checked.

Expected result

I was a bit surprised that the file scanner was needed at all for formatting, but if it is, ideally it would not take 5 secs to scan our project. Not sure if there's other options here, like if running the daemon would make it so this scanning cost only had to be paid on startup once.

Code of Conduct

  • [x] I agree to follow Biome's Code of Conduct

jpulec avatar Apr 01 '25 17:04 jpulec

Hello @jpulec, please provide a minimal reproduction. You can use one of the following options:

  • Provide a link to our playground, if it's applicable.
  • Provide a link to GitHub repository. To easily create a reproduction, you can use our interactive CLI via npm create @biomejs/biome-reproduction

Issues marked with S-Needs repro will be closed if they have no activity within 3 days.

github-actions[bot] avatar Apr 01 '25 18:04 github-actions[bot]

Thanks! That seems definitely on the high side, I agree...

Unfortunately it does seem like we will need the file scanner even for formatting, although maybe we can optimise it to use a "light scan" in such a case? The reason we'll need it is for discovery of settings in monorepos, but that may not necessarily require a full scan.

That said, is it possible your repository contains a lot of files beyond those being checked? I'm thinking maybe files in a build or .output directory that might be getting picked up.

Do you think there's any part of the repository that you could share with us so might understand where the time is being spent?

arendjr avatar Apr 01 '25 18:04 arendjr

I also thought about disabling the scanner for formatting, but we can't disable all of it.

We also need it for nested ignore files

ematipico avatar Apr 01 '25 18:04 ematipico

Do we have any idea why it takes so long to scan a project? I find it strange that both formatting and linting takes less time than scanning.

Conaclos avatar Apr 01 '25 19:04 Conaclos

The scanner scans all files, node_modules too, I think

ematipico avatar Apr 01 '25 19:04 ematipico

When I run format with --verbose, it's all files that I expect to be processed, ~3500. But is the scanner looking at more files than that?

jpulec avatar Apr 01 '25 19:04 jpulec

Yes, the scanner currently goes through everything, minus a few hardcoded exceptions, but notably including node_modules/. The reason it scans node_modules/ is because it needs to know which libraries you have installed and what symbols they export.

Of course, those things aren’t as useful during formatting, which is why a “light scan” might make sense in that case.

arendjr avatar Apr 01 '25 20:04 arendjr

Got it. Couple more stats for our repo... we have a pretty large node_modules, about 4GB of files, and another 1.3GB of zip files in .yarn/cache. There's about ~178k files in the repo between our application and node_modules.

jpulec avatar Apr 02 '25 04:04 jpulec

While a real reproduction would still be helpful, I’m reopening this to keep the conversation going. We’re going to have a new preview out soon, and I think it might be useful to do some more definitive tests before we release 2.0.

arendjr avatar Apr 06 '25 12:04 arendjr

@jpulec Can you try what the performance is with the latest prerelease? We made some tweaks in ignoring certain folders during the scanning, as well as a small tweak that might improve the locking performance when scanning many small files.

You can install it using the following command:

npm i https://pkg.pr.new/biomejs/biome/@biomejs/biome@2d699e3

arendjr avatar Apr 08 '25 07:04 arendjr

Note I opened a task for improving scanner performance: #5636

arendjr avatar Apr 15 '25 11:04 arendjr

I tried v2.0.0-beta.6 and I'm still experiencing huge performance issues, which means I'm seeing multi-second fixes whether it's 1 file or 4000 files fixed.

I'm using it via pre-commit.com (which passes the filename to biome) but running it directly has the same effect.

Biome 2.0.0.beta-6

$ git commit -a
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook

Checked 1 file in 6s. Fixed 1 file.
$ pre-commit run --all-files biome-check
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook

Checked 1931 files in 8s. Fixed 1 file.
Checked 1484 files in 6s. No fixes applied.

Biome v1.9.4

$ git commit -a
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook

Checked 1 file in 14ms. Fixed 1 file.
$ pre-commit run --all-files biome-check
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook

Checked 1930 files in 309ms. Fixed 1 file.
Checked 1484 files in 239ms. No fixes applied.

Context

Number of package.json files in my monorepo (outside node_modules):

$ fd package.json | wc -l
      57

Files in node_modules:

$ find node_modules -type f | wc -l
  185849

Config for v2

Unfortunately the monorepo itself is private

{
  "$schema": "./node_modules/@biomejs/biome/configuration_schema.json",

  "vcs": {
    "enabled": true,
    "clientKind": "git",
    "useIgnoreFile": true
  },
  "files": {
    "includes": [
      "**",
      "!**/package.json",
      "!**/.vscode",
      "!**/libs/config-eslint/rules/no-restricted-syntax.test.tsx"
    ]
  },
  "javascript": {
    "jsxRuntime": "reactClassic"
  },
  "formatter": {
    "indentStyle": "space"
  },
  "assist": {
    "actions": {
      "source": {
        "organizeImports": "off"
      }
    }
  },
  "linter": {
    "domains": {
      "react": "none",
      "test": "none"
    },
    "rules": {
      "recommended": false,
      "correctness": {
        "useImportExtensions": "error"
      }
      "style": {
        "useNodejsImportProtocol": "error",
        "useExportType": "error",
        "useImportType": "error"
      }
    }
  },
  "overrides": [
    {
      "includes": ["**/tsconfig.*.json", "**/project.json"],
      "json": {
        "parser": {
          "allowComments": true,
          "allowTrailingCommas": true
        }
      }
    }
  ]
}

fregante avatar Jun 09 '25 12:06 fregante

@fregante Can you try disabling useImportExtensions? That rule belongs to the project domain, which triggers indexing of files.

You may also want to subscribe to https://github.com/biomejs/biome/issues/6234, which can hopefully help specifically for the use case of commit hooks.

arendjr avatar Jun 09 '25 20:06 arendjr

Can you try disabling useImportExtensions?

Yes without it it's only 141 times slower than biome v1 rather than 430 times.

biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook

Checked 2 files in 1977ms. Fixed 1 file.

fregante avatar Jun 12 '25 02:06 fregante

Issues keep getting closed but I see no progress here. If anything, it's gotten slower

Biome 2.0.6

$ git commit -a
biome check..............................................................Failed
- hook id: biome-check
- files were modified by this hook

Checked 5 files in 9s. Fixed 1 file.

fregante avatar Jun 28 '25 07:06 fregante

No need, we already have a task for it

https://github.com/biomejs/biome/issues/6234

ematipico avatar Jun 28 '25 08:06 ematipico