YARG icon indicating copy to clipboard operation
YARG copied to clipboard

Fuzzy Searching Support

Open Xilophor opened this issue 9 months ago • 3 comments

This PR adds fuzzy matching to the music library search bar, no matter what filter is used, with exception of the year filter for obvious reasons.

To do so, I used the RapidFuzz.Net NuGet package, licensed under MIT. This package is a C# wrapper of the original C++ RapidFuzz library, also licensed under MIT.

In testing, the fuzzy matching does not seem to significantly impact performance, and generally produces better results when searching for songs. Additionally, I have not found any bugs caused by the fuzzy matching; only a bug relating to the filter parsing, which is out of the scope of this PR.

Please let me know of any changes I need to make, particularly with code stylization.

This PR should resolve YARG-253.

Xilophor avatar May 05 '24 00:05 Xilophor

  1. As described in the in-file comment I made, executing searches and refreshes using coroutines (or async) compromises the visual aspect of the menu itself. Instead of giving users the full result of a query, they're given a partial or empty view that changes right in front of them. Additionally, it puts to question how performance is effected and how you'd actually go about measuring it.

I have removed these coroutines, though they were strictly being used to not refresh the list when holding down the backspace button, and had (practically) no impact on search times otherwise. But, as you have stated, they were not doing much, and were very much an unnecessary addition.


  1. I wouldn't personally add charter to the list of nodes to search by. Name & artist are together because those are the two aspects most players remember about any songs they want to play. Very few times will users search by charter - which is why it's reserved as a filter. It helps to simplify the search result sorting algorithm. Add to the fact that the charter info is in the sidebar, making it hard to notice when trying to gauge how an entry qualifies under certain search conditions.

Good point, I've removed it from the list to match against for Unspecified filtering.


  1. On the performance side, in my use, I'm seeing a very noticeable degradation with using search functionalities. Placing a single character in the search bar, for example, on my library of about 7000 songs causes around a .7s of delay before seeing the results. I worry that people with bigger libraries and/or PC specs worse than mine would assess the change as a reversion to the bad search speed it used to have.

This is a very large concern, and not something I thought would occur. To mitigate this some, I've:

  1. Removed the charter comparison for unspecified filtering, which should reduce the (default) search/filtering time by up to 1/3.
  2. I've changed the logic to use string.Contains - the previous logic - when the search query contains 3 or less characters.

In personal testing, this does appear to have increased the performance; particularly when searching with 1-3 characters.


  1. I will have to disagree about one factor with this PR on a fundamental level: you need to have filter functionality at some equivalent level to what it currently is. As nightly gets an auto-generated build, completely breaking filters will just lead to complaints during the waiting period for the fix.

Sorry I wasn't more specific with the bug I mentioned with filtering, or how it overall works.

First off, the bug - existing before this PR - would cause the searching to break when you added a ( for a filtered query, due to a RegEx error. I resolved this commit 0ee2ec2, by simply changing from RegEx.Contains to string.Contains.

The only differences in filter functionality now is fuzzy matching (with exception of year filters), as well as removing some of the dialect/article removal since the Fuzzy Matching handles this.


Thanks for the great feedback! I'll be going through and fixing some other issues that have appeared in further testing before re-requesting a review.

Xilophor avatar May 07 '24 18:05 Xilophor

Changing to draft for now as I haven't had much time lately to work on this PR - will implement some fixes before marking it ready for review again.

Xilophor avatar May 11 '24 02:05 Xilophor

Okay, I'm marking this as ready to review. The main bug mysteriously seems to have disappeared, and the other unintended behavior is just a byproduct of the caching currently in the game - not the previously implemented caching of this PR.

However, since the bug causing low frames when modifying the middle of the current search query text has mysteriously disappeared, I believe there should be additional QC/QA on this PR by others before it is merged.

Xilophor avatar May 13 '24 00:05 Xilophor

Also, I wanna specify: do not MERGE changes, REBASE. Case and point: because you merged instead of rebase, when I open the PR on github desktop, the YARG.Core submodule isn't being properly set to the most recent main repo update version. This is what it's supposed to show up as image image

But this is what YARG.Core gets set to: image image

sonicfind avatar May 25 '24 15:05 sonicfind

Also, I wanna specify: do not MERGE changes, REBASE. Case and point: because you merged instead of rebase, when I open the PR on github desktop, the YARG.Core submodule isn't being properly set to the most recent main repo update version. This is what it's supposed to show up as

Ah, right. Forgot some of the issues that occur when merging instead of rebasing.

Xilophor avatar May 25 '24 18:05 Xilophor

I would like to note that for optimization, searching uses string.Contains for the first three characters, after which is uses fuzzy searching. This has the unintended behavior of requiring the target text (e.g., title, artist) to contain the first three characters exactly.

There's not a good way around this to my knowledge; however, it is such few characters that it is not that large of an issue. The only option available is to lower the limit before fuzzy searching, though this is at the detriment of performance. As stated before, I don't believe the first three characters being exact is that much of an issue, especially since it doesn't have to be the start of the target text (e.g., mins for Numb The Mind).

Xilophor avatar May 25 '24 19:05 Xilophor

Performance wise: yea, that leniency for the first three characters definitely seems to help immensely. It's a shame that the fuzzy search by itself takes so much performance away that the limitation is required. image That being said, we'll still need to see how it fares against larger libs and/or worse pc specs.

sonicfind avatar May 26 '24 03:05 sonicfind

Now, something quick that may need to be addressed. With this search, unspecified search seems to have flipped the order around in terms of whether matching names come before artists when the rank is the same (which used to be the case).

sonicfind avatar May 26 '24 04:05 sonicfind

Something I unfortunately just realized: we can't use the caching functionality because the list of results can be drastically different depending of the string. You can get different sets of songs if you just copy entire search queries vs. typing them out one character at a time.

That could be pretty bad for performance...

sonicfind avatar May 28 '24 00:05 sonicfind

Something I unfortunately just realized: we can't use the caching functionality because the list of results can be drastically different depending of the string. You can get different sets of songs if you just copy entire search queries vs. typing them out one character at a time.

Yeah, it's something I've noticed a little when using it. Though, it's forgiving enough to typically have the target song in the list, and personally I don't completely mistype song names, nor would I assume others would.

It's a bit of a compromise, but it's better than strict string.Contains, as it still allows (some) typos.

Somethings that we could potentially do to mitigate it some are:

  • Invalidate the cache at 4 characters to allow typos in the first three characters
  • Invalidate the cache if you press enter in the search bar, though doing so may cause the list to change wben going to select a song
  • This may be too resource intensive, but Invalidate the cache between words of 4 characters or more

Again, they aren't the greatest solutions, but as it is right now, I think it's good enough; it's more forgiving than current searching - even if it doesn't quite maximize the usefulness of fuzzy matching/searching.

Also, considering people typically will be typing out the text one character at a time, I don't believe the difference of results between typing & pasting will be an annoyance. Especially since the main time people will paste a string is if they've copied a song name.

Xilophor avatar May 28 '24 04:05 Xilophor

Doing some additional testing, it seems like the cache is being invalidated anytime you modify the beginning of the query, causing it to lag out massively. The behavior seems to be the same in stable, although is not as apparent since string.Contains is more performative than fuzzy matching.

I currently can't quite pinpoint where this cache invalidation is happening, as it doesn't appear to happen in SongSearching::Search - the condition to invalidate the cache isn't fulfilled (debug log doesn't run). While this behavior is optimal when removing characters from the beginning of the query, it's not required when adding characters to the beginning of the search.

Would you happen to know where this could happen so I can change this behavior? It especially is not required with a cache invalidation condition I added for longer strings, resulting in better fuzzy matching.

I will also be looking into some pre-filtering and other fuzzy implementations to help reduce the load on fuzzy matching, giving better performance.

Xilophor avatar May 28 '24 16:05 Xilophor

Ok so... update: FuzzySharp's code is so incredibly inefficient.

sonicfind avatar May 28 '24 16:05 sonicfind

Ok so... update: FuzzySharp's code is so incredibly inefficient.

Lmao okay, I'll be looking into implementing a more efficient fuzzy matching/searching method myself.

Xilophor avatar May 28 '24 16:05 Xilophor

no no, I mean I'm optimizing it LOL Well to be exact, optimizing our specific usecase for it.

sonicfind avatar May 28 '24 17:05 sonicfind

no no, I mean I'm optimizing it LOL Well to be exact, optimizing our specific usecase for it.

Ah okay, I'll mess around with some pre-filtering to try to reduce some of the load on the fuzzy matching - especially when the cache is invalidated.

Xilophor avatar May 28 '24 17:05 Xilophor

... you won't need to do that.

sonicfind avatar May 28 '24 17:05 sonicfind

op, okay lmao

guess it's back to bug fixing

Xilophor avatar May 28 '24 17:05 Xilophor

Doing some additional testing, it seems like the cache is being invalidated anytime you modify the beginning of the query, causing it to lag out massively. The behavior seems to be the same in stable, although is not as apparent since string.Contains is more performative than fuzzy matching.

Okay, so it appears to be caused by finding the previous filter/query:

while (prevFilterIndex < searches.Count && currentFilters[currFilterIndex].StartsWith(searches[prevFilterIndex].Filter))
{
  ++prevFilterIndex;
}

To be exact, when editing at the start of the query, the "cache" becomes the entire search list (i.e. searches[0]). This should happen since the query changed at some point earlier than the end.

The issue is that the current search no longer starts with the previous search: drama hatsune miku does not start with dram hatsune miku (note the one character difference at the start of the string). This causes the the previous filter - say at index 1 - to be overridden, causing the filtering to be done against the entire song database, which is very inefficient.

This may not become as big of a deal depending on how much you've been able to optimize FuzzySharp; but currently it is very noticeable - especially towards the start of the query - and should be fixed anyways. I've taken a stab at fixing it by only doing string::StartsWith with the length of the new characters. This would prevent the hatsune miku section in the earlier example from being matched against.

However, so far it seems to be riddled with more bugs, and so I've scrapped that fix, or at least implementation of that fix. I will be trying other methods, but I wanted to post this here in-case you happen to come up with a better solution.

Xilophor avatar May 28 '24 22:05 Xilophor

To be exact, when editing at the start of the query, the "cache" becomes the entire search list (i.e. searches[0]). This should happen since the query changed at some point earlier than the end.

The issue is that the current search no longer starts with the previous search: drama hatsune miku does not start with dram hatsune miku (note the one character difference at the start of the string). This causes the the previous filter - say at index 1 - to be overridden, causing the filtering to be done against the entire song database, which is very inefficient.

That's intentional, but also inaccurate. The point is that it invalidates later cache nodes, but maintains the state of earlier ones. The only reason this doesn't work is due to fuzzy search being an entirely different concept from just basic "Contains" searching. Fuzzy searching can generate entirely different sets of results depending on the input - even if the input is just a longer string - while "Contains" search results are guaranteed to be a direct subset of the prior result.

For correct fuzzying, you HAVE TO search the entire database, otherwise the results would be inconsistent depending on how the query was submitted.

sonicfind avatar May 30 '24 00:05 sonicfind

What we can do is at least cache the prior result, but we must use the entire base sort list as the basis for the fuzzy.

sonicfind avatar May 30 '24 01:05 sonicfind

Or rather, we have to go backwards until we find a node of a different attribute and use that as the base.

sonicfind avatar May 30 '24 01:05 sonicfind

To simply put, those last two commits feel extra convoluted and unnecessary when looking at the bigger picture.

sonicfind avatar May 30 '24 03:05 sonicfind

So update: I've done... a lot. To the point that it has drastically gotten out of the scope of this PR, so I'm gonna probably merge what we got here first before pushing through some of my stuff.

Kinda substantially redid a lot of the fundamentals of how we hold songentries and categories for sorting in general.

sonicfind avatar May 30 '24 16:05 sonicfind