KeyphraseVectorizers issues

Results 16 KeyphraseVectorizers issues

Sort by recently updated

Can't tag with spaCy in some languages

At least with French, the removal of the morphologizer from the spaCy pipeline means that no tags are able to be added, even when trying to use spaCy's POS tags....

mdsutter

bug

Lemmatizing documents and keyphrases

Using lemmatization can result in better quality keyphrases, since similar keyphrases we will be grouped together. Adding lemmatization as an option could be a great feature. If the option is...

hboisgibault

enhancement

Memory Issues

First up, thank you for your work and the results from BERTopic topic modelling works as I expected with this vectorizer. However I am running into out of memory issues...

amoschoomy

bug

enhancement

Use list of POS patterns to reduce runtime

Hi, Thanks for this great package. right now I use `KeyphraseCountVectorizer` method to extract keywords based on different POS patterns. Here is my code: ```python def kph_extr(docs:list, patt:str) -> list...

saied71

enhancement

Divide by zero error when trying to use `KeyphraseCountVectorizer` with BERTopic

I am trying to use `KeyphraseCountVectorizer` using the example provided here https://github.com/TimSchopf/KeyphraseVectorizers#topic-modeling-with-bertopic-and-keyphrasevectorizers ``` from keyphrase_vectorizers import KeyphraseCountVectorizer from bertopic import BERTopic from sklearn.datasets import fetch_20newsgroups # load text documents docs...

Pratik--Patel

Reducing outliers in BERTopic

Hi, this is a great package and it has improved topic modelling I've been doing with BERTopic. Thanks! However, I am encountering a problem when updating the topics to reduce...

ddenz

use of custom stop words

Using it with the KeyBert library and utilizing a list of custom stop words doesn't appear to have any impact. # no custom stop word list `vectorizer = KeyphraseCountVectorizer()` `kw_model.extract_keywords(strip_html(course[2]),...

gboyega1

Regex from the paper?

Can you please include, at least in the documentation, the regex from the paper? In this code, the "standard is to only select keyphrases that have 0 or more adjectives,...

turian

documentation

It does not exclude stop words in Portuguese

I am testing a document in Portuguese, but it doesn't exclude the stop words from the result even I already defined stop_words='portuguese'. ![CleanShot 2023-06-19 at 22 50 08@2x](https://github.com/TimSchopf/KeyphraseVectorizers/assets/6707194/535306b7-af10-4900-84a6-91e0f9d7e0cb) This is...

phuclh

OnlineKeyphraseVectorizer

Hi! Thank you for the great project, it does wonders for the interpretability of the topics! I noticed there doesn't seem to be an Online version, similar to [OnlineCounterVectorizer](https://maartengr.github.io/BERTopic/getting_started/vectorizers/vectorizers.html#onlinecountvectorizer), right?...

edloginova

KeyphraseVectorizers
KeyphraseVectorizers copied to clipboard

Metadata

Can't tag with spaCy in some languages

Lemmatizing documents and keyphrases

Memory Issues

Use list of POS patterns to reduce runtime

Divide by zero error when trying to use `KeyphraseCountVectorizer` with BERTopic

Reducing outliers in BERTopic

use of custom stop words

Regex from the paper?

It does not exclude stop words in Portuguese

OnlineKeyphraseVectorizer

← Metadata

Owner

Metadata

KeyphraseVectorizers KeyphraseVectorizers copied to clipboard

Metadata

← Metadata

Owner

Metadata

KeyphraseVectorizers
KeyphraseVectorizers copied to clipboard