acora
acora copied to clipboard
Can't STOP when I want to use longest match
Hi, I came across a bug when use longest_match
as README.rst
introductions to do greedy search for the longest matching keywords.
the longest_match
did as README,
def _longest_match(matches):
spos_groupby_iter = groupby(matches, itemgetter(1)) # by spos
for _, kw_iter_with_same_spos in spos_groupby_iter:
l = list(kw_iter_with_same_spos)
print(l)
yield max(l) # max get the longest keyword
It can never stop when I match a sentence like
因为弱覆盖
(sorry for use Chinese because time is too less to construct a English case)and word dict has words like
覆盖
弱覆盖
. What's more, the following 2 conditions will not raise the BUG:
-
text
弱覆盖
is not the end of the text.like sentence =
因为弱覆盖的原因
-
use list, instead of iterator.
def _longest_match(matches): pre_get_list = list(matches) # get the list instead of the iterator spos_groupby_iter = groupby(pre_get_list, itemgetter(1)) # by spos for _, kw_iter_with_same_spos in spos_groupby_iter: l = list(kw_iter_with_same_spos) print(l) yield max(l) # max get the longest keyword
SO may be it is because of not a proper StopIteration is raised? SORRY for I can't help currenttly. What I can do is only report the bug...
I use unicode as str representation, In Python2, CentOS7.2
Thanks for your report. It suggests that it might be mishandling the case that the last match occurs at the end of the input string.