alvations

Results 154 comments of alvations
trafficstars

Actually, if https://github.com/jakerylandwilliams/partitioner is already a working package in Python, there might not be a need to port/reimplement code. Users can easily choose to use the tokenizer directly form partitioner....

Instead of using the `pipenv run python ...`, you could try: ``` RUN python -c "import nltk; nltk.download('popular')" ``` Or ``` RUN python -m nltk.downloader popular ``` The `popular` collection...

@fcbond any advice on this?

Quick hack, following #2154 ```python >>> import nltk >>> punkt = nltk.data.load('tokenizers/punkt/english.pickle') >>> punkt._params.abbrev_types.add('al') >>> text = 'If David et al. get the financing, we can move forward with the...

+1 @nschneid Most of Rebecca's work is in HPSG which I would love to integrate into NLTK but it's a tough nut. @goodmami, @fcbond and the DELPH-IN group has done...

@nschneid after some trawling on the REPP code, there're quite a lot of LISP rules written in separate file. Maybe the first thing we could try is to organize all...

@nschneid for now, the simplest solution seems to be wrapping REPP and reading the output files like other third party tools in NLTK. It seems simple enough and there are...

+1 for TokenizeAnything. There's also https://github.com/jonsafari/tok-tok from @jonsafari.

I've written a small wrapper for REPP: https://github.com/alvations/nltk/blob/repp/nltk/tokenize/repp.py. Will do a PR once the `translate` modules are more stable.

Ported `tok-tok.pl` into python: https://github.com/alvations/nltk/blob/repp/nltk/tokenize/toktok.py too.