elm-text-search
elm-text-search copied to clipboard
Issue with the word "Loyalty"
I went to https://elm-lang.org/try and added rluiten/elm-text-search
as a dependency.
This snippet demonstrates the issues by performing 3 searches and printing out the number of results for each lo
, loy
, and loya
respectively. You can see that only for loy
, I'm getting no results back.
import ElmTextSearch
import Html
main =
let
index =
ElmTextSearch.new
{ ref = .id
, fields = [ (.title, 1 ) ]
, listFields = []
}
indexAddResult =
ElmTextSearch.add { id = "1234", title = "Loyalty" } index
searchResultLo =
indexAddResult
|> Result.andThen (\i ->
ElmTextSearch.search "lo" i |> Result.map (Tuple.second >> List.map Tuple.first)
)
|> Result.map List.length
|> Result.withDefault 0
searchResultLoy =
indexAddResult
|> Result.andThen (\i ->
ElmTextSearch.search "loy" i |> Result.map (Tuple.second >> List.map Tuple.first)
)
|> Result.map List.length
|> Result.withDefault 0
searchResultLoya =
indexAddResult
|> Result.andThen (\i ->
ElmTextSearch.search "loya" i |> Result.map (Tuple.second >> List.map Tuple.first)
)
|> Result.map List.length
|> Result.withDefault 0
in
Html.ul []
[ Html.li [] [Html.text (String.fromInt searchResultLo)]
, Html.li [] [Html.text (String.fromInt searchResultLoy)]
, Html.li [] [Html.text (String.fromInt searchResultLoya)]
]
I have seen this and hope to have a look at it in next week or 2. At the moment I don't have a working elm development environment for assorted reasons.
I had a brain wave last night about the search and realised it may not be a bug.
I took your example and added examples of what the porter stemmer does to the words code below the output at bottom.
I get these results.
Given this searching for the word "loy" won't find anything because the stemmer is involved and converts that to "loi" and that doesnt match what the porter stemmer does to "Loyalty".
So at them moment I would say this is working as it should due the nature of porter stemmer being used. The porter stemmer is very useful but it can cause surprises like this.
If you modify the configuration of the index and do not use the stemmer you will get the behavior you expect.
import ElmTextSearch
import Html
import Stemmer
main =
let
index =
ElmTextSearch.new
{ ref = .id
, fields = [ (.title, 1 ) ]
, listFields = []
}
indexAddResult =
ElmTextSearch.add { id = "1234", title = "Loyalty" } index
searchResultLo =
indexAddResult
|> Result.andThen (\i ->
ElmTextSearch.search "lo" i |> Result.map (Tuple.second >> List.map Tuple.first)
)
|> Result.map List.length
|> Result.withDefault 0
searchResultLoy =
indexAddResult
|> Result.andThen (\i ->
ElmTextSearch.search "loy" i |> Result.map (Tuple.second >> List.map Tuple.first)
)
|> Result.map List.length
|> Result.withDefault 0
searchResultLoya =
indexAddResult
|> Result.andThen (\i ->
ElmTextSearch.search "loya" i |> Result.map (Tuple.second >> List.map Tuple.first)
)
|> Result.map List.length
|> Result.withDefault 0
stemLoyalty = Stemmer.stem "Loyalty"
stemLo = Stemmer.stem "lo"
stemLoy = Stemmer.stem "loy"
stemLoya = Stemmer.stem "loya"
in
Html.ul []
[ Html.li [] [Html.text (String.fromInt searchResultLo)]
, Html.li [] [Html.text (String.fromInt searchResultLoy)]
, Html.li [] [Html.text (String.fromInt searchResultLoya)]
, Html.li [] [Html.text "Stemming Results below here"]
, Html.li [] [Html.text stemLoyalty]
, Html.li [] [Html.text stemLo]
, Html.li [] [Html.text stemLoy]
, Html.li [] [Html.text stemLoya]
]
Hi! I'm having this exact problem but can't figure what to change in my config to not use the Stemmer. what would be the correct config for this?
@lescuer97 By customizing the ElmTextSearch.newWith
configs to remove the default Stemmer:
import Index.Defaults as Defaults
index = ElmTextSearch.newWith
{ indexType = Defaults.elmTextSearchIndexType
, ref = .id
, fields = [ ( .title, 1 ) ]
, listFields = []
, initialTransformFactories = Defaults.defaultInitialTransformFactories
, transformFactories = [] -- This is the key, remove the default `Stemmer.stem` factory
, filterFactories = Defaults.defaultFilterFactories
}
Working full example:
import ElmTextSearch
import Html
import Index.Defaults as Defaults
main =
let
index = ElmTextSearch.newWith
{ indexType = Defaults.elmTextSearchIndexType
, ref = .id
, fields = [ ( .title, 1 ) ]
, listFields = []
, initialTransformFactories = Defaults.defaultInitialTransformFactories
, transformFactories = [] -- This is the key, remove the default `Stemmer.stem` factory
, filterFactories = Defaults.defaultFilterFactories
}
indexAddResult =
ElmTextSearch.add { id = "1234", title = "Loyalty" } index
searchResultLo =
indexAddResult
|> Result.andThen (\i ->
ElmTextSearch.search "lo" i |> Result.map (Tuple.second >> List.map Tuple.first)
)
|> Result.map List.length
|> Result.withDefault 0
searchResultLoy =
indexAddResult
|> Result.andThen (\i ->
ElmTextSearch.search "loy" i |> Result.map (Tuple.second >> List.map Tuple.first)
)
|> Result.map List.length
|> Result.withDefault 0
searchResultLoya =
indexAddResult
|> Result.andThen (\i ->
ElmTextSearch.search "loya" i |> Result.map (Tuple.second >> List.map Tuple.first)
)
|> Result.map List.length
|> Result.withDefault 0
in
Html.ul []
[ Html.p [] [Html.text "Lo = ", Html.text (String.fromInt searchResultLo)]
, Html.p [] [Html.text "Loy = ", Html.text (String.fromInt searchResultLoy)]
, Html.p [] [Html.text "Loya = ", Html.text (String.fromInt searchResultLoya)]
, Html.p [] [Html.text "^ a \"1\" indicates a successful match"]
]
HUGE CAVEAT: without stemming, words like search
, searched
, searching
, searchable
, etc. will all be considered different words, and as such if a document had the text "This is searchable"
and someone types in the search query "searching", it will not find that document. Depending on your use case, this may or may not be an applicable or acceptable trade off.
Stemming can be explained simplistically as "reducing a word to its root"; there are different algorithms to automate it, but none are perfect without a human-curated white/black list of words to stem vs. not to stem (e.g. should meese
get stemmed as moose
?). The Porter Stemmer algorithm that this library implements is accurate enough for most use cases, but clearly as you've all noted, there are cases where it doesn't perform as expected.
Thanks @peteygao for the response, I just hadn't got around to it yet.
thank you both for the responses @peteygao @rluiten I will be check this out later