alvations comments

Results 153 comments of


                                            alvations

Update readme that this repository is no longer maintained

This PR is so much needed, esp. with #150

Transformers

@ankit--agrawal "attention", nice pun =)

Type hinting / annotation (PEP 484)?

Yes, type-hints are nice. If you would like to, pick a module or a couple of files that you'll like to add type-hints and create a PR. Someone will review...

Type hinting / annotation (PEP 484)?

Maybe you can try `*_score.py` files from https://github.com/nltk/nltk/tree/develop/nltk/translate. They are quite isolated and self-contained. Otherwise, a really useful place to have type-hints would be any files/functions/classes in https://github.com/nltk/nltk/tree/develop/nltk/tag

Type hinting / annotation (PEP 484)?

Thanks @f0lie , the type-hints looks good as a start! Our sincere thanks to the people who's helping at PyCon too. I've not tried personally tried mypy, but I guess...

Bad solving of issue #2151

@mmmm1998 Thank you for raising the issue. The patch there for `ru-rnc-new` was there to hot-plug in the new mappings without messing with the existing data in `nltk_data`. I also...

chomsky_normal_form() for grammars

The `chomsky_normal_form()` in NLTK is a tree-binarization function. I think it can't be directly applied to grammars, see https://github.com/nltk/nltk/blob/develop/nltk/treetransforms.py Grammar transformation to CNF is rather complex and hasn't yet been...

Casual tokeniser allows newline in elipses and phone number tokens

I think something like there's more things that `\s` represent. To capture a single space, it would be `[^\S\t\n\r\f\v]` or simply space ` ` =) We could simply use `(?:\.(?:...

word_tokenize keeps the opening single quotes and doesn't pad it with space

If we make the following changes to `word_tokenize` at https://github.com/nltk/nltk/blob/develop/nltk/tokenize/__init__.py, it would achieve similar behavior as of Stanford CoreNLP: ```python import re from nltk.tokenize.treebank import TreebankWordTokenizer # Standard word tokenizer....

word_tokenize keeps the opening single quotes and doesn't pad it with space

This issue is on the opening quotes and the clitic fix for that can be easily done and that'll make the `word_tokenize` behave like Stanford's. IMHO, I think it's a...