Eric Kafe issues

Results 23 issues of


                                            Eric Kafe

Prevent multiple parallel downloads of the same package

Fix #3248 using file locks. In #3248, @naktinis describes a situation where users launch multiple useless parallel downloads of the same file, and cause race conditions. Rather than encouraging this...

Move the pickles to a special collection

Now that alternative data packages are available for all the pickles, the question arises: what to do with the old packages? Simply removing them seems very unsafe for those users...

Unsufficient alpha-conversion in CCG module

One additional bug that came up while investigating #3345 and its CCG lexicon, was that variables that should have been alpha-converted were collapsed instead, resulting in unacceptable outputs that contained...

Support some zipped models

The latest version of this PR fixes issue #3473 for all the unit tests under _nltk/test/unit_, which can then load their datafiles from zipped packages.. The first commits solved the...

tokenizer

tagger

classifier

Too many packages need unzipping

Four out of five nltk_data packages are set to unzip by default, although it seems likely that most package readers would only require minimal adjustments in order to not need...

Avoid segfaults in LazyCorpusLoader._unload()

This is a follow up to #3454, intended to find out why LazyCorpusLoader._unload() segfaults when testing with Python 3.13, and eventually fix it.

corpus

Release workflow in PR #3426 needs modifications

PR #3426 introduced an automated release workflow using _release.yml_. But pushing several release tags to my personal nltk clone (https://github.com/ekaf/nltk) has failed to produce an available experimental release. All failed...

Avoid KeyError in langnames.py

Fix #3403: this pull request addresses issue #3403 by making the language name lookup functions in langnames.py more robust. It prevents potential KeyError exceptions by ensuring that missing language codes...

Enhancement: Hybrid CCG/UCG support using Feature Structures

#### Motivation The current NLTK CCG implementation, while a strong and clear example of the formalism, has limitations in its linguistic expressiveness. It successfully demonstrates the core principles of syntactic...

KeyError in lang2q/tag2q for unknown tags is disruptive for downstream functions

## Problem The current implementation of `lang2q` (and the underlying `tag2q`) in `nltk.langnames` raises a `KeyError` if the BCP-47 tag is not found in the `bcp47.wiki_q` dictionary. This causes downstream...