[Search] Searching for "The The" does not yield the expected results
Hello, when searching for an artist named "The The", it seems I cannot get the expected results. I get all the artists with some "The" in the name. I admit the name of the band is quite peculiar :-)
Is there any method that allows the current version to successfully execute this search?
Thank you!
I wrote something like this for my own website in a completely different context. You basically have to find a non-overlapping fit for each word entered.
You'd need a for loop that is as deep as there are words in the search. Recursion is the answer. For each word, you need to loop through all the matches, and in each loop, try to find non-overlapping matches for the rest of the words. Pass along the text to search through, a list of remaining words to find, and a list of ranges that can't be overlapped.
Hello @ajpanton, on my side, as an api consumer, I only have the subsonic api method available, I don't think I can effectively do what you suggest (which btw looks correct to me).
My code in Python looks like this:
import re
def findWords(haystack, needles, illegalRanges):
if not needles:
return illegalRanges # Success
matchRanges = [m.span() for m in re.finditer(re.escape(needles[0]), haystack)]
for matchRange in matchRanges:
if overlaps(matchRange, illegalRanges):
continue # This match overlaps, continue to the next one
new_needles = needles[1:]
new_illegalRanges = illegalRanges + [matchRange]
result = findWords(haystack, new_needles, new_illegalRanges)
if result:
return result # Downstream success, pass along the result towards the source
return False # loop finished without success
def overlaps(range, ranges): # Checks if range overlaps with any of ranges
for r in ranges:
if not (range[1] <= r[0] or r[1] <= range[0]):
return True
return False
You'd call it by using
findWords(haystack, search_keywords.split(), [])
I'm not sure how this subsonic API even works, but I guess the client passes over the search keywords to the server, which then returns the results. If so, this could maybe be implemented on the server side? @epoupon My code results the actual ranges of the matches, which I'm then using to highlight the matched words. This might not be possible here, so maybe it's easier to just return True when a match is made. Of course you might also want to convert this to something other than Python.
Hello, I believe the changes should happen server-side. In fact, considering the limited number of results for a search call, the artist "The The" could not even be included among the returned artists, so even if I applied the algorithm on the client side, there would be no guarantee that I could get the "The the" artist as the first result. Anyway many thanks for your help!