alvations
alvations
The model's vocab only returns the unigrams: ``` >>> import arpa >>> x = arpa.loadf('big.arpa') >>> x[0].vocabulary() ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-',...
Without the explicit HTTP, clicking the link github would automatically append the url to the existing incoming url and leads to https://github.com/fastai/courses/blob/master/deeplearning2/course.fast.ai/part2.html
There're characters that throws an IndexError, e.g. `聞` in `聞く`: ``` >>> from furigana.furigana import print_html >>> print_html(u'聞') Traceback (most recent call last): File "", line 1, in File "/home/ltan/.local/lib/python3.5/site-packages/furigana/furigana.py",...
If it's of any help, for reference, there's an old wiki extractor tool from @jodaiber https://github.com/jodaiber/Annotated-WikiExtractor and there's one that's currently refreshed for Python3 on https://github.com/alvations/rubyslippers
With the current `CoreNLPParser.tag()`, the "retokenization" by Stanford CoreNLP is unexpected: ```python >>> from nltk.parse.corenlp import CoreNLPParser >>> ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner') >>> sent = ['my', 'phone', 'number', 'is', '1111',...
I'm not sure it's because my environment wasn't set up wrongly or because the `nltk.downloader` code is somehow violating Python causing the `RuntimeWarning` ```python $ python3 -m nltk.downloader reuters /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py:125:...
`word_tokenize` keeps the opening single quotes and doesn't pad it with space, this is to make sure that the clitics get tokenized as `'ll`, `'ve', etc. The original treebank tokenizer...
I'm not sure which line is causing this problem when the input is an empty line. ``` >>> from konlpy.tag import Kkma >>> kkma = Kkma() >>> kkma.sentences('') [] >>>...
With Python3 and `pip3` on Mac OSX, there's a sql config error ``` $ sudo pip3 install -U https://github.com/clips/pattern/archive/development.zip Collecting https://github.com/clips/pattern/archive/development.zip Downloading https://github.com/clips/pattern/archive/development.zip (24.8MB) 100% |████████████████████████████████| 24.8MB 40kB/s Collecting future...
``` >>> from sacremoses import MosesTokenizer >>> mt = MosesTokenizer() >>> mt.tokenize('क्या') ['क', '्', 'या'] ```