pke icon indicating copy to clipboard operation
pke copied to clipboard

KeyError: 'hinglish'

Open upasana-mittal opened this issue 2 years ago • 6 comments

I am getting this error while importing pke

get_alpha_2 = lambda l: LANGUAGE_CODE_BY_NAME[l] KeyError: 'hinglish'

     File "/app/model/src/analysis/AnalysisService.py", line 6, in <module>
  from pke.unsupervised import TextRank, TopicRank, SingleRank
File "/usr/local/lib/python3.7/site-packages/pke/__init__.py", line 5, in <module>
  from pke.base import LoadFile
File "/usr/local/lib/python3.7/site-packages/pke/base.py", line 31, in <module>
  lang_stopwords = {get_alpha_2(l): l for l in stopwords._fileids}
File "/usr/local/lib/python3.7/site-packages/pke/base.py", line 31, in <dictcomp>
  lang_stopwords = {get_alpha_2(l): l for l in stopwords._fileids}
File "/usr/local/lib/python3.7/site-packages/pke/base.py", line 29, in <lambda>
  get_alpha_2 = lambda l: LANGUAGE_CODE_BY_NAME[l]
KeyError: 'hinglish'`

upasana-mittal avatar Jul 07 '22 21:07 upasana-mittal

I'm getting the same error...does anyone know what's wrong?

atabas avatar Jul 11 '22 16:07 atabas

Reason for KeyError: Pke library requires nltk library for the language codes. In pke's "langcodes.py" there is absence of language code for 'hinglish'.

Solution: In the home location, the "nltk_data" folder will be present. Inside nltk_data/corpora/stopwords there will be file named as 'hinglish'. Just remove that file from that folder and your error will be taken care of.

ajithb073 avatar Jul 14 '22 11:07 ajithb073

where to get "nltk_data" folder in colab?

aradhana298 avatar Aug 08 '22 15:08 aradhana298

where to get "nltk_data" folder in colab?

Check the path where nltk is downloading. Normally it is stored in the /root/ directory. You can access the root directory on the left side of the colab pane by clicking on "..." which means more options. It is visible beside the sample.

nltk nltk2

hammadmukhtar21 avatar Aug 10 '22 18:08 hammadmukhtar21

you can simply do !rm /root/nltk_data/corpora/stopwords/hinglish

btw removing did not worked for me

btw i did not face the issue with latest version

talhaanwarch avatar Aug 17 '22 17:08 talhaanwarch

I had issue because I will installing on commit hash but since I switched to full git, it is working fine. no more error

pip install git+https://github.com/boudinfl/pke.git

upasana-mittal avatar Aug 17 '22 18:08 upasana-mittal

As said earlier in the thread, please update to the latest version. If you are using pke with an unsupported language please provide custom stopwords using stoplist argument as such:

shadok_stoplist = ['ga', 'zo']
preprocessed_document = [  # Obtained via custom pos tagging tool or manual annotation
    [('ga', 'DET'), ('bu', 'NOUN'), ('zo', 'AUX'), ('meu', 'ADJ'), ('.', 'PUNCT')]
]
e = pke.unsupervised.MultipartiteRank()
e.load_document(
    preprocessed_document, language='shadok',
    stoplist=shadok_stoplist, normalization=None)

ygorg avatar Sep 30 '22 08:09 ygorg