vscode-cpptools icon indicating copy to clipboard operation
vscode-cpptools copied to clipboard

1.19.3 (Insiders build) Performance Enhancement Experiment: Go To Symbol

Open fearthecowboy opened this issue 1 year ago • 9 comments

1.19.3-Insiders Performance Enhancement Experiment: Go To Symbol

With the 1.19.3 release of the C/C++ extension, significant changes have been made to the 'Go To symbol in the workspace' (this addresses several issues including #4934 #7908 #7914)

This implementation is the result of an extensive deep dive investigation that I did into the performance of VSCode, and crafting a brand new design for implementing features such as this. As such, it is an experimental feature, and we are looking for some serious feedback on its performance and accuracy.

Call to Action

We're looking for feedback on the new experimental implementation of 'Go To symbol in the workspace' (ctrl-T) for VSCode, both positive and otherwise - If you're able to test the new implementation and provide feedback, it would be greatly appreciated.

Note: When you upgrade to 1.19.3 it the IntelliSense browse database will be rebuilt, which may take a few minutes on large projects.

When upgrading to the 1.19.3 insiders release, you may randomly be assigned to be either in the experiment group (using the enhancement) or in the control group (no enhancement).

If you are not in the experiment group, you can explicitly opt-in the experiment group by adding the following setting into your settings.json file (either globally or in the workspace):

    "C_Cpp.experimentalFeatures": "enabled",

Note: Setting C_Cpp.experimentalFeatures to disabled will opt you out of the experiment group.

Once you have the setting in place, you can test the new implementation by using the 'GoTo symbol' in VSCode (Ctrl-T)

Any feedback that you can provide would be greatly appreciated. Feel free to post comments in this thread with any experience you wish to share.

Feedback

If you have feedback (positive or otherwise) on the new implementation, please post it in this thread. We are looking for feedback on the following:

  • Performance - does the search feel sufficiently fast?

    • if it is not as fast as would expect, ,
  • Quality - Are you getting to the symbol you're looking for easily?

  • When giving feedback - the more details you can give, the better we can hone the results.

    • some details about the hardware you are using (OS/CPU/RAM/DISK)
    • the size of the workspace you are searching in (size on disk, total number of source files)
    • if you can provide a reproducible example, that would be very helpful. (ie, github repo, and the symbol you're looking for, and the search criteria you're using)

Details of the new implementation

The new implementation of 'GoTo symbol in the workspace' (ctrl-T) for VSCode uses an entirely new algorithm for searching for symbols in the workspace. It is using a full-text-search index of symbols that is maintained on the fly, which allows us to quickly find symbols using a variety of search methods.

The search is handled through several different queries, starting with finding very literal matches, and progressively moving to very fuzzy matches. The result is that it should return very relevant results in a fraction of the time that it was previously.

General search behavior

  • VSCode orders the results by its own relevance algorithm, so the most relevant results should be at the top.

  • VSCode filters the search results to only show results where all the characters in the input are found in the fully qualified symbol name, and in the order they are specified, so a search with teh will not return the, but th will. Generally, the more characters that are specified, the more narrow the results should be.

  • The maximum number of symbols returned from a search is 10000 - this is a reasonable limitation, both in order to keep the number of results to a useful maximum, and to not have it take excessively long to return results.

  • Symbol matching is generally case-insensitive, so should work regardless of the case of the input, but if there are too many results because of fuzzy matching, it will tend to be more accurate when casing matches the expected symbol.

  • Symbols can now be searched for in a specific scope (class or namespace) using :: in the input. For example, foo::bar will search for symbols named bar in the scope of foo. This is useful for finding symbols that have common names, but are in different scopes. For example, foo::bar will find bar in foo, but not bar in baz.

  • The scope itself can be searched, so foo:: will find symbols in any scope containing foo -- foo::bar, bar::foo::baz, foo_bar::baz, etc. This is useful for finding all symbols in a specific namespace. In a large workspace, this may return a large number of results.

The following kinds of searches are performed in order:

Direct matches

Searches for symbols that match the input, or where the symbol has words that start with the input.

  • foo - will match foo and fooBar, bar_foo, bar_foo_baz
  • fooBar - will match fooBar and fooBarBaz, bar_fooBarz

Substring search

Searches for symbols that contain the input as a substring anywhere in the symbol name.

  • foo will match tofoo

Abbreviations or word searches

Searches for symbols that match the input as an abbreviation of a given, or contains the words in the input.

  • fooBar will match fooBar, foo_bar, bizFooBar, and biz_foo_bar
  • fb will match fooBar and foo_bar
  • dsmc will match doSomethingMoreComplicated as well as do_something_more_complicated
  • Scoped searches like fb::dsmc will match fooBar::doSomethingMoreComplicated

Fuzzy searching

Searching with letters that are in the symbol name in the order they appear, but not necessarily adjacent. This starts with closer matches and progressively gets fuzzier, but will stop searching when it reaches a threshold of time.

  • vbs will match averybigsymbol
  • bip::vbs will match biginformationscope::averybigsymbol

fearthecowboy avatar Oct 19 '23 21:10 fearthecowboy

  1. Cpptools has been updated to 1.19.3 image

  2. "C_Cpp.experimentalFeatures": "enabled",
    

Test Result: Seems not any improvement as before. I don't know what wrong is. repo: https://github.com/Shaka0723/cpptools_fuzzysearchTest

screen recording: fuzzy

Shaka0723 avatar Feb 19 '24 06:02 Shaka0723

@Shaka0723 - So, what's happening here is that the direct matches are picking up matches (so, usb is matching), but usbgain doesn't because the fuzzy searching is somewhat limited on how close the characters need to be (for usbgain to match UsbAudioSendSpeakerToVolumeGain the range (which currently is capped to ~16 characters) would have to be increased significantly, which would take longer on very large workspaces.

if you search for usbGain (where the case changes gives the algorithm something to split words on) you should be able to find that symbol.

Given that usb and usbvol initially matched your symbol early on, I'm curious if you see this as a significant productivity gap, or is this a more of an extreme example?

fearthecowboy avatar Feb 20 '24 16:02 fearthecowboy

Performance wise, seems much better to me 🎆 . Actually usable now. Its about 1s delay before seeing matches in a million line C++ codebase, whereas before it was maybe 30s to a minute.

Quality of results - making sense to me.

eclazi avatar Feb 20 '24 20:02 eclazi

@Shaka0723 - So, what's happening here is that the direct matches are picking up matches (so, usb is matching), but usbgain doesn't because the fuzzy searching is somewhat limited on how close the characters need to be (for usbgain to match UsbAudioSendSpeakerToVolumeGain the range (which currently is capped to ~16 characters) would have to be increased significantly, which would take longer on very large workspaces.

if you search for usbGain (where the case changes gives the algorithm something to split words on) you should be able to find that symbol.

Given that usb and usbvol initially matched your symbol early on, I'm curious if you see this as a significant productivity gap, or is this a more of an extreme example?

I don't think this is a extreme example. it works very fine if I search in file symbol - via inputting usbvol usbVol usbgain usbGain - it's case-insensitive.

also, when I input dsmc, doSomethingMoreComplicated fileted but do_something_more_complicated not. - quite strange.

Simply speaking, I am expecting the performance and experience of the search algorithm to be the same as for symbol searches in files - case-insensitive, fuzzy.

Shaka0723 avatar Feb 21 '24 09:02 Shaka0723

one more word, can let user decide the characters number range but not mandatorily set 16 max by cpptools? we are not always in the scenario of using large workspace.

the range (which currently is capped to ~16 characters)

Shaka0723 avatar Feb 22 '24 05:02 Shaka0723

@fearthecowboy Should we (or the user) file a feature request to add a setting for the max fuzzy character distance? But if that setting were added, it seems like we would also need to add a setting for the fuzzy search timeout, otherwise setting the distance too high could just result in the timeout getting hit, resulting in no fuzzy symbols still.

sean-mcmanus avatar Feb 23 '24 00:02 sean-mcmanus

I would like to ask if there will be support for fuzzy searches exceeding 16 characters in the future,or this plugin will just support 0~16 characters

chall1123 avatar Mar 16 '24 01:03 chall1123

@likui1123 With https://github.com/microsoft/vscode-cpptools/releases/tag/v1.20.0 , i.e. the fuzzy character limit has increased to 28.

sean-mcmanus avatar Mar 29 '24 22:03 sean-mcmanus

@likui1123 With https://github.com/microsoft/vscode-cpptools/releases/tag/v1.20.0 , i.e. the fuzzy character limit has increased to 28.

I tried 1.20.0, it has been better than before.

but why not let user decide fuzzy character limit? eg, let 28 be default value but not a fixed one, and limit should be <100

Shaka0723 avatar Apr 08 '24 03:04 Shaka0723