KeyBERT
KeyBERT copied to clipboard
No scores when candidates parameter is added
No scores are returned when you provide the candidates
parameter for KeyBERT()
from keybert import KeyBERT
doc = """
Kos. Griekenland staat bekend om de prachtige eilanden waar je terecht kan voor zonovergoten vakanties.
"""
kw_model = KeyBERT()
keywords = kw_model.extract_keywords(doc, candidates=['Griekenland', 'Kos'])
Shows the warning message:
\venv\lib\site-packages\sklearn\feature_extraction\text.py:1369: UserWarning: Upper case characters found in vocabulary while 'lowercase' is True. These entries will not be matched with any documents
warnings.warn(
and keywords variable is returned empty.
Without the candidates paramater it does return a result with scores:
keywords = kw_model.extract_keywords(doc)
Result:
[('griekenland', 0.5619), ('zonovergoten', 0.5024), ('bekend', 0.4398), ('prachtige', 0.4118), ('terecht', 0.4039)]
When I change the candidates words to lower case words or when I add lowercase=False
to the CountVectorizer it seems to return the words with a score as expected.:
keywords = kw_model.extract_keywords(doc, candidates=['griekenland', 'kos'])
In version 0.6.0 of KeyBERT() it wasn't an issue if the candidates words where capitalized.
count = CountVectorizer(
ngram_range=keyphrase_ngram_range,
stop_words=stop_words,
min_df=min_df,
vocabulary=candidates,
**lowercase=False**
).fit(docs)
Strangely enough it does seem to work on one of the virtual environments I've been using for a while, but I can't get it to work on newly installed environments even when I replicate it with the same versions of the packages installed. I expected the bug was in one of the installed packages, but this does not seem the case.