alvations
alvations
Actually, if https://github.com/jakerylandwilliams/partitioner is already a working package in Python, there might not be a need to port/reimplement code. Users can easily choose to use the tokenizer directly form partitioner....
Instead of using the `pipenv run python ...`, you could try: ``` RUN python -c "import nltk; nltk.download('popular')" ``` Or ``` RUN python -m nltk.downloader popular ``` The `popular` collection...
@fcbond any advice on this?
Quick hack, following #2154 ```python >>> import nltk >>> punkt = nltk.data.load('tokenizers/punkt/english.pickle') >>> punkt._params.abbrev_types.add('al') >>> text = 'If David et al. get the financing, we can move forward with the...
+1 @nschneid Most of Rebecca's work is in HPSG which I would love to integrate into NLTK but it's a tough nut. @goodmami, @fcbond and the DELPH-IN group has done...
@nschneid after some trawling on the REPP code, there're quite a lot of LISP rules written in separate file. Maybe the first thing we could try is to organize all...
@nschneid for now, the simplest solution seems to be wrapping REPP and reading the output files like other third party tools in NLTK. It seems simple enough and there are...
+1 for TokenizeAnything. There's also https://github.com/jonsafari/tok-tok from @jonsafari.
I've written a small wrapper for REPP: https://github.com/alvations/nltk/blob/repp/nltk/tokenize/repp.py. Will do a PR once the `translate` modules are more stable.
Ported `tok-tok.pl` into python: https://github.com/alvations/nltk/blob/repp/nltk/tokenize/toktok.py too.