kwx
kwx copied to clipboard
BERT, LDA, and TFIDF based keyword extraction in Python
[spaCy](https://github.com/explosion/spaCy) has new loading mechanisms in the later versions that produce errors in data preparation within [kwx.utils](https://github.com/andrewtavis/kwx/blob/main/src/kwx/utils.py). The scripts should be changed to check the spaCy version so that these...
The current translation feature found in [kwx.utils.translate_output()](https://github.com/andrewtavis/kwx/blob/main/src/kwx/utils.py) is based on [py-googletrans](https://github.com/ssut/py-googletrans), which is steadily being less and less maintained. A better option would be if the translation feature could be...
A major difference between BERT and LDA kwx implementations is that there are no visualization methods for BERT. It would be good to add a [pyLDAvis](https://github.com/bmabey/pyLDAvis) style visualization of topic...
Hi Andrew, again me :) I want to ask two questions about the algorithm. When using the first BERT model, why are we remove ngrams and can't we use them...
Hi Andrew, I was trying the Keyword Extraction API with TF-IDF, the code is: bert_kws = extract_kws( method="TFIDF", # "BERT", "LDA", "TFIDF", "frequency" bert_st_model="xlm-r-bert-base-nli-stsb-mean-tokens", text_corpus=corpus_no_ngrams, # automatically tokenized if using...
This issue is for discussing and eventually implementing key-phrase extraction for BERT in kwx. It would be best to first collect code snippets and documentation links for how to best...
This issue is for discussing and eventually implementing key-phrase extraction for LDA in kwx. It would be best to first collect code snippets and documentation links for how to best...
This issue is for discussing and eventually implementing key-phrase extraction for TFIDF in kwx. It would be best to first collect code snippets and documentation links for how to best...
Please use this issue to suggest other methods for keyword extraction that could be included in kwx. Suggestions would ideally include some of the following: - A blogpost or other...
### **1 st changes** - In this modified code, the `spacy_version` variable is used to store the version of the `SpaCy` library. Inside the loop, the code checks whether the...