contextualized-topic-models icon indicating copy to clipboard operation
contextualized-topic-models copied to clipboard

ctm.save crashed when training_dataset is somewhat large

Open elchorro opened this issue 2 years ago • 1 comments

I notice the ctm.save() method tries to save the training dataset (800k items in my case). This. however cause a crash on my machine.

I was able to resove the problem by deleting the reference to train_data in ctm.save and then modyfing the ctm.load method to pass a dataset.

In any case, it seems like storing the training dataset (except for id2token) may not be desirable in use cases where one wants to load a model to predict topics to unseen documents or continue training on a different dataset.

elchorro avatar Apr 19 '22 17:04 elchorro

Thanks!

Let me label this as a bug.

Might make sense to remove the dataset in a future version of the model

vinid avatar Apr 25 '22 08:04 vinid