Disable fuzzy matcher for helpdesk searches
Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [X] I have added tests that prove my fix is effective or that my feature works.
Description
The helpdesk use some kind of fuzzy matching to find items:
Here it matched "Server" to "Service" and considered it to be a valid search result.
However, this system is IMO not good enough as it returns results that are too different (even a simple case like the screenshot above can be confusing, see the internal support for more examples).
This is because it rely on the levenstein php function, which is not the most powerful tool for this. Maybe we should use a dedicated PHP package that contains a stronger algorithm.
For now I propose to disable it, it is far from an important feature and most users probably don't even know it exist. We'll redo it later when we have more time.
References
Internal support ticket: !40767 !40844
Tests are failing (not sure it's related to current PR)
I am not sure this proposal would improve the user experience. Indeed, of all the colleagues I have had since I started developing, at least a good third had problems with spelling and/or conjugation.
I agree. The fuzzy search is a net-positive. If it isn't done already, sorting by score so the most relevant results are higher in the list would reduce the issue of "too different" results being returned.
The idea was to disable it for now as I don't think the current implementation is good enough and rewrite it later (because I am overloaded with subjects on my end so not much time for this :/).
Another internal ref complained about it, I don't think we'll miss much by disabling it for now.
@orthagh Your opinion on this?
Until someone make a better version of fuzzy matching, maybe at least, we can increase the minimal trigger for enabling the Levenshtein part ? I think it's set on 5 right now, and we can move to 10 ? Something like that.
I agree that detection of misspelling is a strong argument of this search bar. But in the first example ticket, “Service” matching “Server” look like a very bad result indeed. On the second mentioned ticket, I also don't find the initial case good enough. Customer said “prot” doesn't match “protection de l'information”, and it's a case where GLPI does a pure str_contains (case-insensitive).
Perhaps, I can ask someone from PS to check the issue, if everyone is overloaded on our side.
The current deletion cost is 0, maybe the Server/Service matching issue is due to this.
The current deletion cost is 0, maybe the Server/Service matching issue is due to this.
If you up the deletion cost then partial match do not work anymore.
levenshtein is good to compare single words but form names contains multiple words usually so it get tricky.
There are probably known mathematical solutions and algorithm to this problem.
I will look into this to see if it is possible to improve the search algorithm.