Chris Little

Results 49 issues of Chris Little

A new PipelineTokenizer() should be added, which takes a list or *args of initialized _Tokenizers. The resulting tokenizers will be called in serial, adding their totals to a common Counter...

feature request

Is there a variant of Levenshtein that: 1. operates on tokens 2. scores the distance between tokens by another distance measure (possibly Levenshtein itself)

feature request
question

add notebooks for the smaller modules (ConfusionMatrix..., Corpus/NGramCorpus, etc.) ~~draw up a brief tutorial for the manual~~ (completed f54e94e)

documentation

Try to eliminate: - [ ] encoding errors - [ ] too-long lines - [ ] overfull hboxs (same?) - [ ] underfull hboxs? ....?

punted from 0.4.0 -- I can't figure out a good algorithm/recurrence

feature request

write a JOSS paper cf. https://joss.theoj.org/

project management
documentation

- [ ] Rees/Taxamatch (https://confluence.csiro.au/public/taxamatch/the-rees-2007-phonetic-algorithm-as-used-in-taxamatch) - [x] Taft algorithms (from NYSIIS author's monograph)

feature request

http://www.zompist.com/spell.html partial implementation: ` def rosenfelder(word): """Calculate Mark Rosenfelder's English pronunciation of a word. This is based on the rule set/algorithm presented at http://www.zompist.com/spell.html """ # define constants orth_vowels =...

feature request

Source: http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stem.c http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stem.h Tests: http://snowball.tartarus.org/algorithms/kraaij_pohlmann/diffs.txt Archive of webpage: http://web.archive.org/web/20010411031043/http://www-uilots.let.uu.nl:80/uplift/

feature request

French Soundex-type algorithm: https://doi.org/10.3138/CHR-053-04-03 https://www.persee.fr/doc/adh_0066-2062_1976_num_1976_1_1313

feature request