KeyphraseVectorizers icon indicating copy to clipboard operation
KeyphraseVectorizers copied to clipboard

Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a document-keyphrase matrix.

Results 16 KeyphraseVectorizers issues
Sort by recently updated
recently updated
newest added

At least with French, the removal of the morphologizer from the spaCy pipeline means that no tags are able to be added, even when trying to use spaCy's POS tags....

bug

Using lemmatization can result in better quality keyphrases, since similar keyphrases we will be grouped together. Adding lemmatization as an option could be a great feature. If the option is...

enhancement

First up, thank you for your work and the results from BERTopic topic modelling works as I expected with this vectorizer. However I am running into out of memory issues...

bug
enhancement

Hi, Thanks for this great package. right now I use `KeyphraseCountVectorizer` method to extract keywords based on different POS patterns. Here is my code: ```python def kph_extr(docs:list, patt:str) -> list...

enhancement

I am trying to use `KeyphraseCountVectorizer` using the example provided here https://github.com/TimSchopf/KeyphraseVectorizers#topic-modeling-with-bertopic-and-keyphrasevectorizers ``` from keyphrase_vectorizers import KeyphraseCountVectorizer from bertopic import BERTopic from sklearn.datasets import fetch_20newsgroups # load text documents docs...

Hi, this is a great package and it has improved topic modelling I've been doing with BERTopic. Thanks! However, I am encountering a problem when updating the topics to reduce...

Using it with the KeyBert library and utilizing a list of custom stop words doesn't appear to have any impact. # no custom stop word list `vectorizer = KeyphraseCountVectorizer()` `kw_model.extract_keywords(strip_html(course[2]),...

Can you please include, at least in the documentation, the regex from the paper? In this code, the "standard is to only select keyphrases that have 0 or more adjectives,...

documentation

I am testing a document in Portuguese, but it doesn't exclude the stop words from the result even I already defined stop_words='portuguese'. ![CleanShot 2023-06-19 at 22 50 08@2x](https://github.com/TimSchopf/KeyphraseVectorizers/assets/6707194/535306b7-af10-4900-84a6-91e0f9d7e0cb) This is...

Hi! Thank you for the great project, it does wonders for the interpretability of the topics! I noticed there doesn't seem to be an Online version, similar to [OnlineCounterVectorizer](https://maartengr.github.io/BERTopic/getting_started/vectorizers/vectorizers.html#onlinecountvectorizer), right?...