minbpe
minbpe copied to clipboard
Count only nonoverlapping occurences of a pair
For example, you want to count 2 (not 4) occurrences of the pair 'aa' in text 'aaaaa', because merge() can replace it just 2 times. In other words the counted occurrences should not overlap.
See the discussion in #51
sentencepease
also implements it this way, per this paper (p. 3).