contextualized-topic-models icon indicating copy to clipboard operation
contextualized-topic-models copied to clipboard

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

Results 16 contextualized-topic-models issues
Sort by recently updated
recently updated
newest added

I notice the ctm.save() method tries to save the training dataset (800k items in my case). This. however cause a crash on my machine. I was able to resove the...

bug
enhancement

#71 Because of the structure of the Korean language, needed some different tokenizers instead of a white space tokenizer. Since konlpy is one of the famous Korean NLP python package,...

https://github.com/MilaNLProc/contextualized-topic-models/blob/4fdf8d922500bd2c24f8df068e0f3c898ad85451/contextualized_topic_models/utils/preprocessing.py#L35 Need to fix this to support other languages

enhancement

Is parallel GPU training support possible? We would like to try this with a fairly large (multi-GB) dataset, but to make training time reasonable it would need to be done...

enhancement
help wanted

* Contextualized Topic Models version: 2.2.0 * Python version: 3.8 * Operating System: macOs Monterey 12.5.1 ### Description https://github.com/MilaNLProc/contextualized-topic-models/blob/f3225055440b2ebf3bedb7143868954f1e1478d7/contextualized_topic_models/evaluation/measures.py#L166 This line throws an error with gensim>=4.0.0 ``` AttributeError: The vocab...

* Contextualized Topic Models version: 2.3.0 * Python version: 3.8.12 * Operating System: Ubuntu 22.04.1 LTS x86_64 ### Description I ran the ZeroShotTM example notebook on my local machine. ###...

dependencies

* Contextualized Topic Models version: 1.4.1 * Python version: 3.5.2 * Operating System: macOS - bash ### Description I am trying to run CombinedTM to create topics for a dataset...

Bumps [pip](https://github.com/pypa/pip) from 21.1 to 23.3. Changelog Sourced from pip's changelog. 23.3 (2023-10-15) Process Added reference to vulnerability reporting guidelines <https://www.python.org/dev/security/>_ to pip's security policy. Deprecations and Removals Drop a...

dependencies

Specifically, the existing code is treats some variables as variance while the variables are referred to as sigma, which is the standard notation of standard deviation. Fixing the variable names...

I was following the [reparameterization code](https://github.com/MilaNLProc/contextualized-topic-models/blob/master/contextualized_topic_models/networks/decoding_network.py#L117). During [invocation](https://github.com/MilaNLProc/contextualized-topic-models/blob/master/contextualized_topic_models/networks/decoding_network.py#L128C61-L128C80), the variable [`posterior_log_sigma`](https://github.com/MilaNLProc/contextualized-topic-models/blob/master/contextualized_topic_models/networks/decoding_network.py#L124C23-L124C42) is incorrectly named. It's supposed to be `posterior_log_variance`. Similarly, the [inf net](https://github.com/MilaNLProc/contextualized-topic-models/blob/master/contextualized_topic_models/networks/inference_network.py#L52-L53) needs to be renamed to variance instead...