contextualized-topic-models
contextualized-topic-models copied to clipboard
ctm.save crashed when training_dataset is somewhat large
I notice the ctm.save() method tries to save the training dataset (800k items in my case). This. however cause a crash on my machine.
I was able to resove the problem by deleting the reference to train_data in ctm.save and then modyfing the ctm.load method to pass a dataset.
In any case, it seems like storing the training dataset (except for id2token) may not be desirable in use cases where one wants to load a model to predict topics to unseen documents or continue training on a different dataset.
Thanks!
Let me label this as a bug.
Might make sense to remove the dataset in a future version of the model