neotest icon indicating copy to clipboard operation
neotest copied to clipboard

Performance problems on large repos

Open stevearc opened this issue 2 years ago • 5 comments

When I try to open the summary on a test file in a repo at work, nvim freezes for a very long time. The issue is this line: https://github.com/rcarriga/neotest/blob/aaf2107a0d639935032d60c49a5bdda2de26b4d6/lua/neotest/client/init.lua#L408 The find_files actually completes within ~2 seconds, but then running adapter.is_test_file 250k+ times takes several minutes (vim-test adapter). After that freeze, there is a second freeze while it tries to write all the events to the logfile (hasn't finished yet).

My personal opinion, test discovery should be definitely optional, and possibly configurable. My previous job had a monorepo that was so big it was only served as a virtual filesystem, so running any type of crawling operation would have terrible side effects. Making it configurable (e.g. only these dirs, only this depth) might be nice, but would probably make more sense per-adapter instead of globally. If adapters need to control the search path, their API would have to change from testing individual files that neotest gives them to calling back into neotest with whatever their search config is. IDK if that refactor is worth it, which is why I could go either way on making the search configurable.

Another direction could be to customize the search on a per-directory basis instead of per-adapter. That avoids the need for an adapter API refactor, and in general this kind of functionality would be incredibly useful. I often work with many different types of repos on the same machine, sometimes in the same vim instance (using :tcd and one tab per project), and these projects will sometimes use the same language but require different configurations to run tests. I'd love to be able to configure my test adapters on a per-project basis. A rough proposal, it could look like:

require('neotest').setup({
  adapters = ...,
  summary = ...,
  discovery = {
    enabled = true,
  },
  projects = {
    ["~/work/big"] = {
      adapters = ...,
      discovery = {
        enabled = true,
        dirs = {'tests/unit', 'frontend/tests'},
        depth = 2
      },
    },
    ["~/personal/project"] = {
      adapters = ...,
      discovery = {
        enabled = false,
      },
    },
  },
})

I am happy to submit a PR for any parts of this once we align on a solution

Unrelated question: I'll probably be making more proposals, requests, and questions. Is filing an issue the best way to start a discussion, or would you prefer some other channel?

stevearc avatar Jun 09 '22 00:06 stevearc

The find_files actually completes within ~2 seconds, but then running adapter.is_test_file 250k+ times takes several minutes (vim-test adapter). After that freeze, there is a second freeze while it tries to write all the events to the logfile (hasn't finished yet).

Yep the vim-test test detection is horrifically slow. It should however only trigger if an open buffer registers as a possible test file so if you're not actually using it for that repo I'd suggest disabling it for whatever filetype is causing it to run (this is just a workaround for this particular case)

On to the broader issue

If adapters need to control the search path, their API would have to change from testing individual files that neotest gives them to calling back into neotest with whatever their search config is. IDK if that refactor is worth it, which is why I could go either way on making the search configurable.

I actually originally wrote the adapter interface to work this way but switched to the current way to avoid running multiple crawls but now the initial detection of tests (project root detection, open buffers etc) is smarter.

Another direction could be to customize the search on a per-directory basis instead of per-adapter.

I can see this being a pretty popular request not for performance but as you say for the per-project runner/settings. I think it's inevitable something will need handle this. The proposed interface looks pretty good AFAICT, though the adapters I've written (and I imagine others) would need to be refactored to handle configuring multiple instances. Wouldn't be too much work though.

I think the project/discovery configuration would solve this issue better than having adapters controlling the search as it solves other issues too so for now I'll say let's go with that. This will be a rather large change so it'll take some time to work on. I think I'd like to take a stab at it myself just to get a grasp of the implementation details. I can provide more thoughts on interface as I put some more thought into it.

I'll probably be making more proposals, requests, and questions. Is filing an issue the best way to start a discussion, or would you prefer some other channel?

Love to hear that, thanks! To be honest I haven't thought about it much. I don't mind using issues right now but do feel it could get a bit noisy quickly depending on traffic. I've not used GitHub discussions much so I can take a look at that. Open to other suggestions as well! I'm planning on setting up a neotest org to keep all the repos under so I can add it to the admin TODOs :sweat_smile:

rcarriga avatar Jun 09 '22 21:06 rcarriga

the adapters I've written (and I imagine others) would need to be refactored to handle configuring multiple instances

Oh this would be excellent! That might solve another issue I've been having with multiple directories open in the same vim. If both dirs use the same test adapter, I see "Common root not found" errors. But if each dir were using a different adapter instance that might be one way around it.

I think the project/discovery configuration would solve this issue better than having adapters controlling the search as it solves other issues too so for now I'll say let's go with that. This will be a rather large change so it'll take some time to work on.

Sounds good! LMK if I can help at any point.

stevearc avatar Jun 10 '22 00:06 stevearc

I see "Common root not found" errors

As far as I remember this is actually a bit of laziness on my part because I don't open multiple projects in one NeoVim instance. The trees just need to be merged inside of a new parent here https://github.com/rcarriga/neotest/blob/master/lua/neotest/lib/positions/init.lua#L84. The way the new file detection works, it shouldn't (could be wrong) scan the new parent directories.

However I do prefer the idea of having separately maintained trees for each project as it'll present a lot cleaner IMO

rcarriga avatar Jun 10 '22 08:06 rcarriga

Finally got around to implementing per-project configs :sweat_smile: I've got it in https://github.com/nvim-neotest/neotest/pull/100 just for testing but it seems to work nicely AFAICT. If anyone runs in to issues please let me know!

rcarriga avatar Aug 28 '22 15:08 rcarriga

Awesome!! I'll test it out when I get a chance and leave feedback

stevearc avatar Aug 28 '22 18:08 stevearc

I'm also having lags on bigger repos. This is especially annoying during BufAdd and BufLeave events, ~disabling the test dicovery (https://github.com/nvim-neotest/neotest/pull/30) in config doesn't solve the issue~ disabling test discovery does solve the issue 👍🏻

EDIT: I had wrong config setup 🤦🏻 all good.

stevanmilic avatar Sep 05 '22 18:09 stevanmilic

Separate to the project config work, I've done some benchmarking and found an easy optimisation within the treesitter internal code. I've implemented it in the the latest master and I'm seeing a 30% reduction in time to parse the cpython repo :grin: It should hopefully improve performance for most languages. This won't solve all performance issues but it will hopefully improve things a bit.

I'm also planning on creating a consumer that can run benchmarks easily so others can see what is affecting performance

rcarriga avatar Sep 07 '22 08:09 rcarriga

OK I've added two new config options to help with discovery performance.

  1. Concurrent discovery control: You can now specify the number of concurrent workers parsing files. The default is CPU cores + 4 (this is a bit random, just went with what Python's ThreadPoolExecutor defaults to). Set this to 1 and it should reduce CPU usage considerably but it will also slow down test discovery.
  2. Directory filtering: The proposed dirs option would probably suit most people but many projects put tests alongside corresponding code files so I thought this was a bit inflexible. Instead there's now a filter_dir option that can filter what directories to search for test files.

In terms of user experience, directory filtering is preferred over limiting concurrency which is preferred over disabling discovery altogether. I believe that these features are sufficient to tackle nearly all performance related issues. Would love to hear feedback on these! :smile:

If you are still experiencing performance issues and you've disabled concurrency and set up filtering to at least rule out scanning directories like node_modules or venv please let me know but also be sure to rule out other plugins. I rarely experience significant slowdown due to neotest, it's almost always LSP (wish there was a better Python LSP :sweat_smile:)

rcarriga avatar Sep 11 '22 10:09 rcarriga

I tried adding the filter_dir option, but it looks like it isn't being called. My project is set up like this:

require("neotest").setup_project(vim.loop.cwd(), {
  adapters = {
    require("neotest-elixir")({ extra_formatters = { "ExUnit.CLIFormatter", "ExUnitNotifier" } }),
  },
  discovery = {
    enabled = true,
    filter_dir = function(name, rel_path, root)
      print(name, rel_path, root)
      return false
    end,
  },
})

I see no messages and no directory is filtered.

Also, in Elixir all tests belong in a test/ folder, it isn't possible to put the test file alongside the source file, and because of the recent change in the find function, basically every project has to add the filter_dir option as Neotest is now detecting test files in the deps directory.

Would be possible to for an adapter to set a default filter_dir function?

jfpedroza avatar Sep 12 '22 02:09 jfpedroza

Would be possible to for an adapter to set a default filter_dir function?

Yep definitely think this is needed :+1: Adapters can now define a filter_dir function with the same signature

I tried adding the filter_dir option, but it looks like it isn't being called.

Everything looks correct. Can you enable debug logging and submit a separate issue?

rcarriga avatar Sep 12 '22 05:09 rcarriga

No need, I have the setup_project call in .nvimrc and the Neotest config in after/plugin/neotest, so the configuration was being overwritten.

Works great, thank you.

jfpedroza avatar Sep 13 '22 04:09 jfpedroza

I've not heard further reports on performance issues so I'm not sure how prevalent they are anymore but I wanted to have a little fun with using Neovim's RPC API. I've implemented the treesitter parsing to be run in a separate Neovim instance so that the only cost on the main editor instance is deserialising results. It's sitting in a PR right now https://github.com/nvim-neotest/neotest/pull/119, it'd be great to get some feedback on it.

It should never expose any errors directly to the user, instead logging errors and falling back to the current in-process parsing, so when testing please check the logs (I'd recommend the INFO level). The main issue I'm anticipating is lazy loading, I've added a call for require("neotest") to the subprocess so if you do lazy load, just add module = "neotest" to the config for packer (or equivalent for other managers).

Also some adapters (neotest-elixir, neotest-go, neotest-dotnet, neotest-rspec, neotest-phpunit) use callback functions for the parsing which can't be done over RPC. The solution here is to provide a string to evaluate to a function in the child process (e.g. require("my-adapter")._build_position so if adapter authors or people willing to make the change wish to test, it should be relatively simple.

If testing goes well, it might be possible to expose more helpers to run CPU intensive work in the subprocess

rcarriga avatar Sep 28 '22 08:09 rcarriga

I made the change, and it seems to be working well, although I didn't notice any change in the time it takes to discover.

What I did notice is that discovery is triggered twice.

When I open the summary, it outputs:

Searching <project folder> for test files
Discovering files with 20 workers

And then prints a bunch of Parsing <file> lines.

Then, I can see the Searching and Discovering lines again, and the Parsing line for every file.

Any file I searched in the log appeared twice, and I removed the log between runs.

Is there a reason for that?

jfpedroza avatar Oct 03 '22 02:10 jfpedroza

Thanks for testing it out!

I don't think that'd be related to these changes as there's no change in the discovery logic, this is purely in the treesitter library functions. Are the second log lines from the parent or child instance (has CHILD in the line)? Also are you doing anything or just letting discovery run and then exiting?

Edit: Also to clarify, this won't speed up discovery (it's doing the same work), it just shouldn't ever block the editor because it's running in a separate neovim process

rcarriga avatar Oct 03 '22 08:10 rcarriga

Yeah, I didn't think it was related to these changes, but it is related to performance.

Are the second log lines from the parent or child instance (has CHILD in the line)?

They are all from the parent instance.

Also are you doing anything or just letting discovery run and then exiting?

I would either open the summary and check the logs, or run a test file and check the logs.

I forgot to mention that the lines Initialising client and Initialisation finished in X seconds would only print once, before the first batch of logs and after the second batch of logs, respectively.

jfpedroza avatar Oct 05 '22 06:10 jfpedroza

Thanks for the info! Could you provide a clean log for some repo that I can access?

rcarriga avatar Oct 05 '22 08:10 rcarriga

Sure. One more thing I noticed is that the duplicate discovering only happens when I open the summary while having a test file loaded in a buffer. If I don't have a test file open, it only prints each file once.

Here is a repo I use for testing. It has very few test files, so the log is not long.

parquex_info.log parquex_debug.log

In the log at INFO level it is easier to notice the duplicates, but the DEBUG one provides more info, so I included both.

jfpedroza avatar Oct 06 '22 05:10 jfpedroza

Ah thanks!

duplicate discovering only happens when I open the summary while having a test file loaded in a buffer

This was the key :sweat_smile: It was an existing issue that has been fixed in master

rcarriga avatar Oct 06 '22 07:10 rcarriga

Great! The difference is very noticeable in my work repo.

jfpedroza avatar Oct 06 '22 19:10 jfpedroza

Great to hear! I'm going to merge the PR so feel free to make the change to neotest-elixir, I'll get around to submitting PRs to other adapters soon as well.

rcarriga avatar Oct 08 '22 14:10 rcarriga

I'm going to close this issue as all of the originally requested features/changes have been made or have alternatives in place. If there are still performance issues, then we can address them specifically in separate issues.

rcarriga avatar Oct 09 '22 10:10 rcarriga

Thanks for the pull requests on the adapters. It definitely feels faster. ❤️

olimorris avatar Oct 09 '22 10:10 olimorris

Fantastic, love to get the feedback :grin:

rcarriga avatar Oct 09 '22 10:10 rcarriga