PyTorchNLPBook icon indicating copy to clipboard operation
PyTorchNLPBook copied to clipboard

Resource punkt not found. Please use the NLTK Downloader to obtain the resource:

Open subhobrata opened this issue 6 years ago • 38 comments
trafficstars

Got This Below error in Notebook 5_2_munging_frankenstein.ipynb Please hep on this

LookupError Traceback (most recent call last) in () ----> 1 tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') 2 with open(args.raw_dataset_txt) as fp: 3 book = fp.read() 4 sentences = tokenizer.tokenize(book)

/usr/local/lib/python3.6/dist-packages/nltk/data.py in load(resource_url, format, cache, verbose, logic_parser, fstruct_reader, encoding) 832 833 # Load the resource. --> 834 opened_resource = _open(resource_url) 835 836 if format == 'raw':

/usr/local/lib/python3.6/dist-packages/nltk/data.py in open(resource_url) 950 951 if protocol is None or protocol.lower() == 'nltk': --> 952 return find(path, path + ['']).open() 953 elif protocol.lower() == 'file': 954 # urllib might not use mode='rb', so handle this one ourselves:

/usr/local/lib/python3.6/dist-packages/nltk/data.py in find(resource_name, paths) 671 sep = '*' * 70 672 resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep) --> 673 raise LookupError(resource_not_found) 674 675

LookupError:


Resource punkt not found. Please use the NLTK Downloader to obtain the resource:

import nltk nltk.download('punkt')

Searched in: - '/root/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/usr/nltk_data' - '/usr/lib/nltk_data'

subhobrata avatar Apr 14 '19 10:04 subhobrata

You can simply run

import nltk
nltk.download('punkt')

in the notebook to download the required files

pmallari avatar Apr 15 '19 12:04 pmallari

punkt is a nltk library tool for tokenizing text documents. When we use an old or a degraded version of nltk module we generally need to download the remaining data . You can do nltk.download('punkt') nltk.download('stopwords') nltk.download('corpus')

ds-manav avatar Oct 04 '21 12:10 ds-manav

You can simply run

import nltk
nltk.download('punkt')

in the notebook to download the required files

[nltk_data] Error loading punkt: <urlopen error [SSL: [nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed: [nltk_data] unable to get local issuer certificate (_ssl.c:1129)>

EldhosePoulose avatar Feb 14 '22 15:02 EldhosePoulose

Got this same thing

ehous3 avatar Mar 02 '22 16:03 ehous3

Try this:

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

ehous3 avatar Mar 02 '22 16:03 ehous3

import nltk nltk.download('punkt')

work for me thanks :)

hassy97 avatar May 31 '22 14:05 hassy97

You can simply run

import nltk
nltk.download('punkt')

in the notebook to download the required files

This worked for me thanks.

AfeJohn avatar Jun 27 '22 15:06 AfeJohn

You can simply run

import nltk
nltk.download('punkt')

in the notebook to download the required files

This worked for me too. Thanks! In terminal, $python3

import nltk nltk.download('punkt')

chethankailash avatar Jun 29 '22 20:06 chethankailash

import nltk import ssl

try: _create_unverified_https_context = ssl._create_unverified_context except AttributeError: pass else: ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

work for me thanks:)

MrRunShu avatar Oct 10 '22 04:10 MrRunShu

I am receiving this error as well and have tried everything in the comments.

lagraham337 avatar Oct 11 '22 21:10 lagraham337

An easy way to get over this 'urlopen error' is to do the process manually. Just go to the website https://www.nltk.org/nltk_data/ and download the required zip file and extract the contents.

In Windows, go to user/AppData/local/Programs/Python/Python(version)/lib and create a folder nltk_data. Then create the respective folder. As an example, for 'punkt' create the folder tokenizers and add the folder 'punkt' inside the extracted folder to it. This info is mostly given by the terminal itself.

Run your program. Cheers!

EDIT 1: Of course, downloading all files can be time-consuming, but it's the only option if the "urlopen error" persists.

EDIT 2 It is also mostly your router or network at fault that you are not able to download nltk files. Try changing your network and that should help.

UjjwalAnand364 avatar Dec 18 '22 21:12 UjjwalAnand364

I am receiving this error as well and have tried everything in the comments.

TRY CHANGING YOUR NETWORK --> i had the same problem where none of the recommended solutions worked until i changed my wifi. I simply used another network and it worked for me. I don't know why this worked but i hope it helps you.

prajwal13579 avatar Feb 05 '23 11:02 prajwal13579

You can simply run

import nltk
nltk.download('punkt')

in the notebook to download the required files

[nltk_data] Error loading punkt: <urlopen error [SSL: [nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed: [nltk_data] unable to get local issuer certificate (_ssl.c:1129)>

TRY CHANGING YOUR NETWORK --> i had the same problem where none of the recommended solutions worked until i changed my wifi. I simply used another network and it worked for me. I don't know why this worked but i hope it helps you.

prajwal13579 avatar Feb 05 '23 11:02 prajwal13579

Code downloads Punkt tokenizer successfully for me import nltk nltk.download('punkt')

usmanyousaaf avatar Feb 13 '23 07:02 usmanyousaaf

need help! I tried every single method that is mentioned or recommended by you all, still can't figure out what should I do now, I made a new file in pythin\lib directly suggested above and also tried to write nltk.download('punkt') none of them worked for me. image

thesakshidiggikar avatar Feb 13 '23 10:02 thesakshidiggikar

need help! I tried every single method that is mentioned or recommended by you all, still can't figure out what should I do now, I made a new file in pythin\lib directly suggested above and also tried to write nltk.download('punkt') none of them worked for me. image

Try This:

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

OR

  • Manually Download the NLTK Data Packages Link

usmanyousaaf avatar Feb 14 '23 16:02 usmanyousaaf

Getting this error guys. Any help would be very helpful. Thanks in advance

nltk.download('punkt') [nltk_data] Error loading punkt: <urlopen error [Errno 54] Connection [nltk_data] reset by peer> False

mohammed-hasan007 avatar Mar 05 '23 15:03 mohammed-hasan007

As mentioned by several people here including me, the primary cause of this error underlies to a faulty/unstable network connection. The code:

import nltk nltk.download('punkt')

works fine. I too had the same problem wherein I was unable to download the resources, and consequently it didn't install in the desired repository. Try changing your network, remove the firewall or use a VPN. Any of these WILL work.

UjjwalAnand364 avatar Mar 05 '23 15:03 UjjwalAnand364

It works fine if the network conection is stable otherwise it crashes . It worked for me :)

ibrahim-string avatar Mar 29 '23 19:03 ibrahim-string

I ran into the same problem but just needed to add the code mentioned above (plus a few additional lines) to get it to work.

Here is the original code: import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize, sent_tokenize from nltk.tag import pos_tag

Here is the modified and working code: import nltk nltk.download('punkt') nltk.download('averaged_perceptron_tagger') nltk.download('stopwords') from nltk.corpus import stopwords from nltk.tokenize import word_tokenize, sent_tokenize from nltk.tag import pos_tag

You'll notice i just added 3 lines. The first is based on the comments above and the other two were derived by extension of the same logic. nltk.download('punkt') nltk.download('averaged_perceptron_tagger') nltk.download('stopwords')

Hope this helps!

pmarathay avatar May 25 '23 14:05 pmarathay

need help! I tried every single method that is mentioned or recommended by you all, still can't figure out what should I do now, I made a new file in pythin\lib directly suggested above and also tried to write nltk.download('punkt') none of them worked for me. image

Try This:

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

OR

  • Manually Download the NLTK Data Packages Link

I've downloaded it manually what to do next

iamrohansood avatar Jun 24 '23 12:06 iamrohansood

i face the same issue. The main issue is that we are not able to connect the raw github url. Where NLTK will download the data. Check bu hitting this url. If you not able to open it. we have the same problem. https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip

You can use following tutorial to solve this issue. https://www.debugpoint.com/failed-connect-raw-githubusercontent-com-port-443/#:~:text=Fix%201%3A%20Updating%20the%20%2Fetc%2Fhosts%20file%20in%20Linux,-If%20you%20are&text=Open%20the%20%2Fetc%2Fhosts%20file.&text=Then%20at%20the%20end%20of%20this%20file%2C%20add%20the%20IP%20address.&text=Save%20and%20close%20the%20file,again%2C%20and%20it%20should%20work.

kbrajwani avatar Jun 27 '23 11:06 kbrajwani

need help! I tried every single method that is mentioned or recommended by you all, still can't figure out what should I do now, I made a new file in pythin\lib directly suggested above and also tried to write nltk.download('punkt') none of them worked for me. image

Try This:

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

OR

  • Manually Download the NLTK Data Packages Link

This solution worked for me as well.

ravijammi avatar Sep 30 '23 21:09 ravijammi

punkt is a nltk library tool for tokenizing text documents. When we use an old or a degraded version of nltk module we generally need to download the remaining data . You can do nltk.download('punkt') nltk.download('stopwords') nltk.download('corpus')

This worked for me !

daviibf avatar Oct 02 '23 11:10 daviibf

Try this:

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

This works!!!!1

varunpalakodeti20 avatar Oct 20 '23 00:10 varunpalakodeti20

Try this:

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

you're god!

charmingjill avatar Oct 29 '23 20:10 charmingjill

An easy way to get over this 'urlopen error' is to do the process manually. Just go to the website https://www.nltk.org/nltk_data/ and download the required zip file and extract the contents.

In Windows, go to user/AppData/local/Programs/Python/Python(version)/lib and create a folder nltk_data. Then create the respective folder. As an example, for 'punkt' create the folder tokenizers and add the folder 'punkt' inside the extracted folder to it. This info is mostly given by the terminal itself.

Run your program. Cheers!

EDIT 1: Of course, downloading all files can be time-consuming, but it's the only option if the "urlopen error" persists.

EDIT 2 It is also mostly your router or network at fault that you are not able to download nltk files. Try changing your network and that should help.

this help!!!!

SHIsue avatar Jan 04 '24 09:01 SHIsue

🪲Its a bug , add these parameters to the word_tokenize function example-> tokens = nltk.word_tokenize(example, language='english', preserve_line=True) This worked for me.

craterr avatar Mar 25 '24 06:03 craterr