alvations issues

Results 70 issues of


                                            alvations

Is there a way to list all possible ngrams for a given string?

The model's vocab only returns the unigrams: ``` >>> import arpa >>> x = arpa.loadf('big.arpa') >>> x[0].vocabulary() ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-',...

Added the HTTP to URL explicitly.

Without the explicit HTTP, clicking the link github would automatically append the url to the existing incoming url and leads to https://github.com/fastai/courses/blob/master/deeplearning2/course.fast.ai/part2.html

IndexError on certain character

There're characters that throws an IndexError, e.g. `聞` in `聞く`: ``` >>> from furigana.furigana import print_html >>> print_html(u'聞') Traceback (most recent call last): File "", line 1, in File "/home/ltan/.local/lib/python3.5/site-packages/furigana/furigana.py",...

[For reference] Another Wiki extractor tool

If it's of any help, for reference, there's an old wiki extractor tool from @jodaiber https://github.com/jodaiber/Annotated-WikiExtractor and there's one that's currently refreshed for Python3 on https://github.com/alvations/rubyslippers

CoreNLPParser tag() should allow properties overloading

With the current `CoreNLPParser.tag()`, the "retokenization" by Stanford CoreNLP is unexpected: ```python >>> from nltk.parse.corenlp import CoreNLPParser >>> ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner') >>> sent = ['my', 'phone', 'number', 'is', '1111',...

good first issue

bug

stanford api

Runtime warning when using nltk.downloader from CLI

I'm not sure it's because my environment wasn't set up wrongly or because the `nltk.downloader` code is somehow violating Python causing the `RuntimeWarning` ```python $ python3 -m nltk.downloader reuters /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py:125:...

pleaseverify

pythonic

word_tokenize keeps the opening single quotes and doesn't pad it with space

`word_tokenize` keeps the opening single quotes and doesn't pad it with space, this is to make sure that the clitics get tokenized as `'ll`, `'ve', etc. The original treebank tokenizer...

good first issue

tokenizer

jpype getting a NullPointerException when calling kkma.sentences

I'm not sure which line is causing this problem when the input is an empty line. ``` >>> from konlpy.tag import Kkma >>> kkma = Kkma() >>> kkma.sentences('') [] >>>...

Status/help wanted

Tagger/Kkma

Keyword/"NullPointerException"

SQL config error on Mac OSX installation

With Python3 and `pip3` on Mac OSX, there's a sql config error ``` $ sudo pip3 install -U https://github.com/clips/pattern/archive/development.zip Collecting https://github.com/clips/pattern/archive/development.zip Downloading https://github.com/clips/pattern/archive/development.zip (24.8MB) 100% |████████████████████████████████| 24.8MB 40kB/s Collecting future...

Tokenization for Hindi (e.g. `क्या`) is weird

``` >>> from sacremoses import MosesTokenizer >>> mt = MosesTokenizer() >>> mt.tokenize('क्या') ['क', '्', 'या'] ```

bug