acora icon indicating copy to clipboard operation
acora copied to clipboard

Can't STOP when I want to use longest match

Open fseasy opened this issue 7 years ago • 1 comments

Hi, I came across a bug when use longest_match as README.rst introductions to do greedy search for the longest matching keywords.

the longest_match did as README,

    def _longest_match(matches):
        spos_groupby_iter = groupby(matches, itemgetter(1)) # by spos
        for _, kw_iter_with_same_spos in spos_groupby_iter:
            l = list(kw_iter_with_same_spos)
            print(l)
            yield max(l) # max get the longest keyword

It can never stop when I match a sentence like

因为弱覆盖

(sorry for use Chinese because time is too less to construct a English case)and word dict has words like

覆盖
弱覆盖

. What's more, the following 2 conditions will not raise the BUG:

  1. text 弱覆盖 is not the end of the text.

    like sentence = 因为弱覆盖的原因

  2. use list, instead of iterator.

    def _longest_match(matches):
        pre_get_list = list(matches) # get the list instead of the iterator
        spos_groupby_iter = groupby(pre_get_list, itemgetter(1)) # by spos
        for _, kw_iter_with_same_spos in spos_groupby_iter:
            l = list(kw_iter_with_same_spos)
            print(l)
            yield max(l) # max get the longest keyword
    

SO may be it is because of not a proper StopIteration is raised? SORRY for I can't help currenttly. What I can do is only report the bug...

I use unicode as str representation, In Python2, CentOS7.2

fseasy avatar Jun 14 '17 09:06 fseasy

Thanks for your report. It suggests that it might be mishandling the case that the last match occurs at the end of the input string.

scoder avatar Aug 22 '17 12:08 scoder