nltk_data
nltk_data copied to clipboard
Verbnet identifier in index.xml mismatch
trafficstars
When recompiling the nltk_data, it throws this error:
nltk_data$ make
python tools/build_pkg_index.py . https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages index.xml
Traceback (most recent call last):
File "tools/build_pkg_index.py", line 24, in <module>
index = build_index(ROOT, BASE_URL)
File "/Users/liling.tan/Library/Python/2.7/lib/python/site-packages/nltk/downloader.py", line 2088, in build_index
for pkg_xml, zf, subdir in _find_packages(os.path.join(root, 'packages')):
File "/Users/liling.tan/Library/Python/2.7/lib/python/site-packages/nltk/downloader.py", line 2216, in _find_packages
'vs %s)' % (pkg_xml.get('id'), uid))
ValueError: package identifier mismatch (verbnet vs verbnet3)
make: *** [pkg_index] Error 1
This is because both verbnet and verbnet3 has the same id:
nltk_data/packages/corpora$ cat verbnet.xml
<package id="verbnet"
name="VerbNet Lexicon, Version 2.1"
version="2.1"
author="Karin Kipper-Schuler"
webpage="https://verbs.colorado.edu/verbnet/"
license="Distributed with permission of the author."
unzip="1"
/>
nltk_data/packages/corpora$ cat verbnet3.xml
<package id="verbnet"
name="VerbNet Lexicon, Version 3.3"
version="3.3"
author="Karin Kipper-Schuler"
webpage="https://verbs.colorado.edu/verbnet/"
license="Distributed with permission of the author."
unzip="1"
/>
The same identifier is causing the mismatch in the nltk code too, c.f. https://github.com/nltk/nltk/issues/2015