nltk_data icon indicating copy to clipboard operation
nltk_data copied to clipboard

Wordnet corpa file problem

Open john-hawkins opened this issue 8 years ago • 0 comments

There appear to be problems with the copy of the wordnet database hosted on github.

Using the default index file https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

Points you toward the file: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/wordnet.zip

If I try and install it with either nltk.download('wordnet')

OR

nltk.download() and choosing the wordnet corpa you will get the error:

nltk.download('wordnet') [nltk_data] Downloading package wordnet to [nltk_data] /Users/xxxxxx/nltk_data... [nltk_data] Unzipping corpora/wordnet.zip. [nltk_data] Error with downloaded zip file False

No matter how many times I try to get the file, even skipping the downloader and going directly to the link listed in the XML file, I always get just 2.5mb, instead of the full 10.

I have tried this on multiple networks and it is always the same problem the file never downloads completely.

john-hawkins avatar Aug 24 '17 12:08 john-hawkins