elm-text-search icon indicating copy to clipboard operation
elm-text-search copied to clipboard

Issue with the word "Loyalty"

Open natebrunette opened this issue 1 year ago • 6 comments

I went to https://elm-lang.org/try and added rluiten/elm-text-search as a dependency.

This snippet demonstrates the issues by performing 3 searches and printing out the number of results for each lo, loy, and loya respectively. You can see that only for loy, I'm getting no results back.

import ElmTextSearch
import Html

main =
  let
    index =
      ElmTextSearch.new
        { ref = .id
        , fields = [ (.title, 1 ) ]
        , listFields = []
        }

    indexAddResult =
      ElmTextSearch.add { id = "1234", title = "Loyalty" } index
      
    searchResultLo =
        indexAddResult
          |> Result.andThen (\i ->
            ElmTextSearch.search "lo" i |> Result.map (Tuple.second >> List.map Tuple.first)
          )
          |> Result.map List.length
          |> Result.withDefault 0  
      
    searchResultLoy =
        indexAddResult
          |> Result.andThen (\i ->
            ElmTextSearch.search "loy" i |> Result.map (Tuple.second >> List.map Tuple.first)
          )
          |> Result.map List.length
          |> Result.withDefault 0
          
    searchResultLoya =
        indexAddResult
          |> Result.andThen (\i ->
            ElmTextSearch.search "loya" i |> Result.map (Tuple.second >> List.map Tuple.first)
          )
          |> Result.map List.length
          |> Result.withDefault 0          
  in
  Html.ul []
    [ Html.li [] [Html.text (String.fromInt searchResultLo)]
    , Html.li [] [Html.text (String.fromInt searchResultLoy)]
    , Html.li [] [Html.text (String.fromInt searchResultLoya)]
    ]

natebrunette avatar Aug 17 '23 20:08 natebrunette

I have seen this and hope to have a look at it in next week or 2. At the moment I don't have a working elm development environment for assorted reasons.

rluiten avatar Aug 26 '23 05:08 rluiten

I had a brain wave last night about the search and realised it may not be a bug.

I took your example and added examples of what the porter stemmer does to the words code below the output at bottom. I get these results. image

Given this searching for the word "loy" won't find anything because the stemmer is involved and converts that to "loi" and that doesnt match what the porter stemmer does to "Loyalty".

So at them moment I would say this is working as it should due the nature of porter stemmer being used. The porter stemmer is very useful but it can cause surprises like this.

If you modify the configuration of the index and do not use the stemmer you will get the behavior you expect.

import ElmTextSearch
import Html
import Stemmer

main =
  let
    index =
      ElmTextSearch.new
        { ref = .id
        , fields = [ (.title, 1 ) ]
        , listFields = []
        }

    indexAddResult =
      ElmTextSearch.add { id = "1234", title = "Loyalty" } index
      
    searchResultLo =
        indexAddResult
          |> Result.andThen (\i ->
            ElmTextSearch.search "lo" i |> Result.map (Tuple.second >> List.map Tuple.first)
          )
          |> Result.map List.length
          |> Result.withDefault 0  
      
    searchResultLoy =
        indexAddResult
          |> Result.andThen (\i ->
            ElmTextSearch.search "loy" i |> Result.map (Tuple.second >> List.map Tuple.first)
          )
          |> Result.map List.length
          |> Result.withDefault 0
          
    searchResultLoya =
        indexAddResult
          |> Result.andThen (\i ->
            ElmTextSearch.search "loya" i |> Result.map (Tuple.second >> List.map Tuple.first)
          )
          |> Result.map List.length
          |> Result.withDefault 0          
          
    stemLoyalty = Stemmer.stem "Loyalty"
    stemLo = Stemmer.stem "lo"
    stemLoy = Stemmer.stem "loy"
    stemLoya = Stemmer.stem "loya"

  in
  Html.ul []
    [ Html.li [] [Html.text (String.fromInt searchResultLo)]
    , Html.li [] [Html.text (String.fromInt searchResultLoy)]
    , Html.li [] [Html.text (String.fromInt searchResultLoya)]
    , Html.li [] [Html.text "Stemming Results below here"]
    , Html.li [] [Html.text stemLoyalty]
    , Html.li [] [Html.text stemLo]
    , Html.li [] [Html.text stemLoy]
    , Html.li [] [Html.text stemLoya]
    ]

rluiten avatar Sep 11 '23 11:09 rluiten

Hi! I'm having this exact problem but can't figure what to change in my config to not use the Stemmer. what would be the correct config for this?

lescuer97 avatar Jan 16 '24 14:01 lescuer97

@lescuer97 By customizing the ElmTextSearch.newWith configs to remove the default Stemmer:

import Index.Defaults as Defaults

index = ElmTextSearch.newWith
    { indexType = Defaults.elmTextSearchIndexType
    , ref = .id
    , fields = [ ( .title, 1 ) ]
    , listFields = []
    , initialTransformFactories = Defaults.defaultInitialTransformFactories
    , transformFactories = [] -- This is the key, remove the default `Stemmer.stem` factory
    , filterFactories = Defaults.defaultFilterFactories
    }

Working full example:

import ElmTextSearch
import Html
import Index.Defaults as Defaults

main =
  let
    index = ElmTextSearch.newWith
      { indexType = Defaults.elmTextSearchIndexType
      , ref = .id
      , fields = [ ( .title, 1 ) ]
      , listFields = []
      , initialTransformFactories = Defaults.defaultInitialTransformFactories
      , transformFactories = [] -- This is the key, remove the default `Stemmer.stem` factory
      , filterFactories = Defaults.defaultFilterFactories
      }

    indexAddResult =
      ElmTextSearch.add { id = "1234", title = "Loyalty" } index
      
    searchResultLo =
        indexAddResult
          |> Result.andThen (\i ->
            ElmTextSearch.search "lo" i |> Result.map (Tuple.second >> List.map Tuple.first)
          )
          |> Result.map List.length
          |> Result.withDefault 0  
      
    searchResultLoy =
        indexAddResult
          |> Result.andThen (\i ->
            ElmTextSearch.search "loy" i |> Result.map (Tuple.second >> List.map Tuple.first)
          )
          |> Result.map List.length
          |> Result.withDefault 0
          
    searchResultLoya =
        indexAddResult
          |> Result.andThen (\i ->
            ElmTextSearch.search "loya" i |> Result.map (Tuple.second >> List.map Tuple.first)
          )
          |> Result.map List.length
          |> Result.withDefault 0
  in
  Html.ul []
    [ Html.p [] [Html.text "Lo = ", Html.text (String.fromInt searchResultLo)]
    , Html.p [] [Html.text "Loy = ", Html.text (String.fromInt searchResultLoy)]
    , Html.p [] [Html.text "Loya = ", Html.text (String.fromInt searchResultLoya)]
    , Html.p [] [Html.text "^ a \"1\" indicates a successful match"]
    ]

HUGE CAVEAT: without stemming, words like search, searched, searching, searchable, etc. will all be considered different words, and as such if a document had the text "This is searchable" and someone types in the search query "searching", it will not find that document. Depending on your use case, this may or may not be an applicable or acceptable trade off.

Stemming can be explained simplistically as "reducing a word to its root"; there are different algorithms to automate it, but none are perfect without a human-curated white/black list of words to stem vs. not to stem (e.g. should meese get stemmed as moose?). The Porter Stemmer algorithm that this library implements is accurate enough for most use cases, but clearly as you've all noted, there are cases where it doesn't perform as expected.

peteygao avatar Jan 23 '24 07:01 peteygao

Thanks @peteygao for the response, I just hadn't got around to it yet.

rluiten avatar Jan 23 '24 10:01 rluiten

thank you both for the responses @peteygao @rluiten I will be check this out later

lescuer97 avatar Jan 23 '24 10:01 lescuer97