minetest icon indicating copy to clipboard operation
minetest copied to clipboard

Server list searchbox is just absurd

Open hlqkj opened this issue 2 years ago • 18 comments

Minetest version

Minetest 5.8.0 (Windows)
Using Irrlicht 1.9.0mt13
Using LuaJIT 2.1.0-beta3
BUILD_TYPE=Release
RUN_IN_PLACE=1
USE_CURL=1
USE_GETTEXT=1
USE_SOUND=1
STATIC_SHAREDIR="."
STATIC_LOCALEDIR="locale"

Active renderer

(Doesn't matter.)

Irrlicht device

(Doesn't matter.)

Operating system and version

Windows 11 x64 23H2 (22631.2715) -- Does this really matter?

CPU model

Seriously.

GPU model

...

OpenGL version

???

Summary

I just downloaded Minetest 5.8 and noticed the following while trying to fav some servers.

The issue is, search box as it is now is just wrong. Examples below.

  1. Trying to find 'Land of Bugs (an Exile Server) v0.4.0-beta' with an almost exact search string like 'Land of Bugs' gives this:

Screenshot 2023-12-05 090901

  1. Trying to find the same with just 'Exile' finally did it:

Screenshot 2023-12-05 091016

  1. I then tried with Dark Lands Survival: I'm not posting screenshots again, but I encourage you to try this:
  • first search for 'Dark' - this almost hits, with just three results;
  • then search for 'Dark Lands' - 8 results, not bad;
  • then finally query for 'Dark Lands Survival' and enjoy the 9879247021 results shown - noting that your wanted best match isn't even sorted such that it would at least rank up a little in the search results.

That is, the more exact predicate you give, the worse the search works... Thumbs up!

Steps to reproduce

See above.

hlqkj avatar Dec 05 '23 08:12 hlqkj

I can confirm. This is the same issue as #13981, just for the serverlist search instead of the settings search.

AND the search terms instead of ORing them, so that each search term crops out more search results instead of adding more results.

I do agree if the results contained the OR matches also but were sorted with results matching AND at the top, it would solve that part.

grorp avatar Dec 05 '23 14:12 grorp

Does this improve your situation?

diff --git a/builtin/mainmenu/tab_online.lua b/builtin/mainmenu/tab_online.lua
index d93f45dcf..d4634ddaa 100644
--- a/builtin/mainmenu/tab_online.lua
+++ b/builtin/mainmenu/tab_online.lua
@@ -196,4 +196,5 @@ local function search_server_list(input)
 	local keywords = {}
 	for word in input:gmatch("%S+") do
+		-- Escape special characters for string.gsub below
 		word = word:gsub("(%W)", "%%%1")
 		table.insert(keywords, word)
@@ -216,5 +217,5 @@ local function search_server_list(input)
 				local sername = server.name:lower()
 				local _, count = sername:gsub(keyword, keyword)
-				found = found + count * 4
+				found = found + count * 5
 			end
 
@@ -222,9 +223,9 @@ local function search_server_list(input)
 				local desc = server.description:lower()
 				local _, count = desc:gsub(keyword, keyword)
-				found = found + count * 2
+				found = found + count * 1
 			end
 		end
 		if found > 0 then
-			local points = (#serverlistmgr.servers - i) / 5 + found
+			local points = (#serverlistmgr.servers - i) * 0.1 + found
 			server.points = points
 			table.insert(search_result, server)

SmallJoker avatar Dec 05 '23 15:12 SmallJoker

I'm sorry, I've never been able to set up a working compile environment on Windows, although I long tried to, so I can't test that myself I'm afraid.

Also, I don't think this is "my" situation, and the "quality" of the search feature should rather be evaluated by objective metrics - not by me...

I don't think that simply tweaking the keyword-based search we have now can solve this TBH. It's almost 2024, and the official server list is starting to be well populated: not saying that we need a semantic search engine in the main menu, but I think that at least word/string similarity should be taken into account, and the search results should be sorted by relevance, not just be a filtered copy of the server list.

hlqkj avatar Dec 05 '23 16:12 hlqkj

You can edit Lua builtin code without needing to recompile, see builtin/ next to bin/ in your Minetest folder.

rollerozxa avatar Dec 05 '23 16:12 rollerozxa

Oh boi I am stupid! - didn't even realize it was LUA! I'll try in a minute, still at work right now... Thanks!

hlqkj avatar Dec 05 '23 16:12 hlqkj

Sorry again for my silliness. I tested those changes, but I can't say I notice a significant improvement from what I was observing earlier today.

Searching for 'Land of Bugs' now works better, as the wanted server shows 3rd in the (still lengthy) list. However, in the other case 'Dark Lands Survival' the server still isn't shown if not scrolling down quite a bit.

I understand that when more servers match a search query, the results sorting must take into account their respective rankings (Edit: I actually don't know how exactly the search code works, here - I assumed it just filters the raw server list as it is received, thus keeping it sorted by rank?). However, the case is different from when an user is wanting to pick a random server based on a keyword they like (eg. "survival") than when they are looking for a specific one: perhaps a balance could be found, for example, almost-exact matches are shown first, no matter their rank.

The main thing remains though: it is non sense that a single-word predicate gives out way less results than a more specific, almost exact one.

hlqkj avatar Dec 05 '23 17:12 hlqkj

Less words mean that there's less server titles and descriptions that contain the word. Ranking is then based on A) where and B) how often the keyword is contained. Hence more keywords will give you more, although more distinctly weighted servers. The code could be refined to weight in the keyword length. It's more likely that of or an is found than land. Feel free to tweak and extend the code that I sent you. Perhaps you can come up with a sorting mechanism that makes sense.

SmallJoker avatar Dec 05 '23 19:12 SmallJoker

The only sorting mechanism that makes sense is for individual keywords to be ANDed. "Land of Bugs" should only match a server that has "[Ll]and" AND "of" AND "[Bb]ugs". Anything else is silly.

~And handling ranking/points/popularity inside a search function? So silly only a PhD could have come up with it.~

Edit: I can't be sure that's what the code's doing, actually. But it looks needlessly complicated.

BluebirdGreycoat avatar Dec 06 '23 00:12 BluebirdGreycoat

yep, same happens to me

nininik0 avatar Dec 08 '23 03:12 nininik0

The only sorting mechanism that makes sense is for individual keywords to be ANDed. "Land of Bugs" should only match a server that has "[Ll]and" AND "of" AND "[Bb]ugs". Anything else is silly.

~And handling ranking/points/popularity inside a search function? So silly only a PhD could have come up with it.~

Edit: I can't be sure that's what the code's doing, actually. But it looks needlessly complicated.

Agree with that. I'll see if I can implement this for the better.

Mahoyomu avatar Jul 30 '24 01:07 Mahoyomu

First: Achieving text search which matches user expectations well is, in general, a tricky topic. But, there are certainly ways our search can be improved, with not too much effort.

The only sorting mechanism that makes sense is for individual keywords to be ANDed

While I can see the motivation, I disagree with this. Back to the example, if a user misremembered a keyword (say "land of mishaps"), they would get no results.

The logic behind ANDing is to get more precise search results given precise search terms. The logic behind ORing is to get more search results, increasing the tolerance for imprecise search terms, and aiding users who perhaps aren't looking for a particular server but rather servers fitting a broad number of keywords.

Both of these motivations are legitimate, and I think a half-decent tradeoff is possible: Simply rank servers which match more search terms always higher than servers which match fewer search terms. That way, "land of bugs" would be at the top when you search for it, followed by all the other "land" servers (and maybe a couple servers mentioning "bugs"), which are likely to have "of" in their description, and then all the rest which have "of" in their description.

(There are also more elaborate scoring systems thinkable that take e.g. order of query words into account, but let's not overcomplicate this for now. Though a special case for a "full match" of the query string as in the linked PR might make sense.)

appgurueu avatar Jul 30 '24 16:07 appgurueu

if a user misremembered a keyword (say "land of mishaps"), they would get no results.

In that case (the example assumes they're searching something specifically) they would back up and remove a keyword, until they see a reasonably filtered (read, reasonably shortened) list of results into which they may find what they're looking for: I think that would still be better than having to go through a filtered set which could very possibly be as long as 3/4 of the whole server list, in the case of the current implementation.

The OR-based option would help when a user wants to find any server by its description - not the name. By that I mean, for example, a new user wanting to play "PvP" or "anarchy" or "survival" (those would be the search terms). But even in that case, I don't see how moving to a purely AND-based search would make things worse, since that user would very possibly try a single-word predicate.

Also, I still am not sure how (if?) the filtered results are sorted: it would make sense to sort them somehow having the likelihood taken into consideration, and perhaps also offering the possibility to sort them by name, or other things (actually, this last topic is unrelated to the search but it's still worth mentioning it here).

hlqkj avatar Jul 30 '24 18:07 hlqkj

if a user misremembered a keyword (say "land of mishaps"), they would get no results.

In that case (the example assumes they're searching something specifically) they would back up and remove a keyword, until they see a reasonably filtered (read, reasonably shortened) list of results into which they may find what they're looking for: I think that would still be better than having to go through a filtered set which could very possibly be as long as 3/4 of the whole server list, in the case of the current implementation.

The OR-based option would help when a user wants to find any server by its description - not the name. By that I mean, for example, a new user wanting to play "PvP" or "anarchy" or "survival" (those would be the search terms). But even in that case, I don't see how moving to a purely AND-based search would make things worse, since that user would very possibly try a single-word predicate.

Also, I still am not sure how (if?) the filtered results are sorted: it would make sense to sort them somehow having the likelihood taken into consideration, and perhaps also offering the possibility to sort them by name, or other things (actually, this last topic is unrelated to the search but it's still worth mentioning it here).

I can't agree more. That's just what I'm thinking.

Mahoyomu avatar Aug 01 '24 00:08 Mahoyomu

I think that the results should be ORed but sorted in a way that puts the most relevant results first (like results matching AND). Relevance would also take into account whether the result was found in the title or elsewhere

rubenwardy avatar Aug 01 '24 00:08 rubenwardy

Sorry, I disagree with that. A pure OR in this context seems just wrong: sorting the results is something due, still, showing the whole server list after an exact match has been found doesn't really make sense, IMHO. I've been invited a minute ago to try "Survival server No. 521" and I let you guess how many results I got searching for the exact match.

hlqkj avatar Aug 01 '24 10:08 hlqkj

Search tokens should be ANDed unless the user explicitly requests OR, through special syntax. Like |

The logic behind ORing is to get more search results, increasing the tolerance for imprecise search terms, and aiding users who perhaps aren't looking for a particular server but rather servers fitting a broad number of keywords.

I cannot agree with this argument put foward.

If someone inputs multiple terms, the only reasonable assumption to make is that the user is trying to narrow the search, not to expand it.

If the user inputs a wrong search term and gets no results, that's user error. It will also strongly signal to the user that they should try another term, or remove a term. Showing them servers that matched their OTHER terms, especially where their other terms were ORed, is IMO worse that useless, because it does not immediately tell the user that their search could not be matched.

The last point in the above quote is an extremely rare use-case. Search should not be borked for 99% of users because of the 1 guy who wants to do OR.

BluebirdGreycoat avatar Aug 01 '24 17:08 BluebirdGreycoat

Search tokens should be ANDed unless the user explicitly requests OR, through special syntax. Like |

The logic behind ORing is to get more search results, increasing the tolerance for imprecise search terms, and aiding users who perhaps aren't looking for a particular server but rather servers fitting a broad number of keywords.

I cannot agree with this argument put foward.

If someone inputs multiple terms, the only reasonable assumption to make is that the user is trying to narrow the search, not to expand it.

If the user inputs a wrong search term and gets no results, that's user error. It will also strongly signal to the user that they should try another term, or remove a term. Showing them servers that matched their OTHER terms, especially where their other terms were ORed, is IMO worse that useless, because it does not immediately tell the user that their search could not be matched.

The last point in the above quote is an extremely rare use-case. Search should not be borked for 99% of users because of the 1 guy who wants to do OR.

You're my internet voice. Exactly what I needed to put my doubts to rest, just the kind of perspective I was looking for.

Mahoyomu avatar Aug 02 '24 00:08 Mahoyomu

For what it's worth, I'm not questioning that ANDing would be better than the current behavior. I would accept a PR that implements ANDing. Something like ANDing seems (with a little bit of tolerance strewn in, but this is usually communicated clearly) seems to be the default search engines implement.

I've been invited a minute ago to try "Survival server No. 521" and I let you guess how many results I got searching for the exact match.

If Survival server No. 521 was the first result, there wouldn't be an issue.

Showing them servers that matched their OTHER terms, especially where their other terms were ORed, is IMO worse that useless, because it does not immediately tell the user that their search could not be matched.

This is a good point.

appgurueu avatar Aug 02 '24 16:08 appgurueu