lms icon indicating copy to clipboard operation
lms copied to clipboard

[Search] Searching for "The The" does not yield the expected results

Open GioF71 opened this issue 9 months ago • 4 comments

Hello, when searching for an artist named "The The", it seems I cannot get the expected results. I get all the artists with some "The" in the name. I admit the name of the band is quite peculiar :-)

Is there any method that allows the current version to successfully execute this search?

Thank you!

GioF71 avatar Mar 26 '25 07:03 GioF71

I wrote something like this for my own website in a completely different context. You basically have to find a non-overlapping fit for each word entered.

You'd need a for loop that is as deep as there are words in the search. Recursion is the answer. For each word, you need to loop through all the matches, and in each loop, try to find non-overlapping matches for the rest of the words. Pass along the text to search through, a list of remaining words to find, and a list of ranges that can't be overlapped.

ajpanton avatar May 28 '25 12:05 ajpanton

Hello @ajpanton, on my side, as an api consumer, I only have the subsonic api method available, I don't think I can effectively do what you suggest (which btw looks correct to me).

GioF71 avatar May 28 '25 16:05 GioF71

My code in Python looks like this:

import re

def findWords(haystack, needles, illegalRanges):
    if not needles:
        return illegalRanges  # Success

    matchRanges = [m.span() for m in re.finditer(re.escape(needles[0]), haystack)]

    for matchRange in matchRanges:
        if overlaps(matchRange, illegalRanges):
            continue  # This match overlaps, continue to the next one

        new_needles = needles[1:]
        new_illegalRanges = illegalRanges + [matchRange]

        result = findWords(haystack, new_needles, new_illegalRanges)
        if result:
            return result  # Downstream success, pass along the result towards the source

    return False  # loop finished without success

def overlaps(range, ranges):  # Checks if range overlaps with any of ranges
    for r in ranges:
        if not (range[1] <= r[0] or r[1] <= range[0]):
            return True
    return False

You'd call it by using findWords(haystack, search_keywords.split(), [])

I'm not sure how this subsonic API even works, but I guess the client passes over the search keywords to the server, which then returns the results. If so, this could maybe be implemented on the server side? @epoupon My code results the actual ranges of the matches, which I'm then using to highlight the matched words. This might not be possible here, so maybe it's easier to just return True when a match is made. Of course you might also want to convert this to something other than Python.

ajpanton avatar May 28 '25 22:05 ajpanton

Hello, I believe the changes should happen server-side. In fact, considering the limited number of results for a search call, the artist "The The" could not even be included among the returned artists, so even if I applied the algorithm on the client side, there would be no guarantee that I could get the "The the" artist as the first result. Anyway many thanks for your help!

GioF71 avatar May 29 '25 12:05 GioF71