Silvia Terragni comments

Results 27 comments of


                                            Silvia Terragni

Integrating D-ETM

Hi Luke, currently I don't have much time to dedicate to this project. Since this is a super busy period for me, I'm focusing more on the maintenance (as much...

Feature request : Adding D-ETM and other dynamic topic model approaches

Hi! Thank you :) Yes, we can definitely think of a way to integrate D-ETM as well. We have already integrated ETM, so I think it shouldn't be that hard....

num_samples should be a positive integer value, but got num_samples=0

Hi Mariana! Thanks for reporting this issue. I tried to reproduce the error using your code and some other data, but the error doesn't occur. Can you please share your...

num_samples should be a positive integer value, but got num_samples=0

Hi A11en0, can you please share your code, version of the library, your python version, and your operating system? I'd be happy to help to solve the issue

num_samples should be a positive integer value, but got num_samples=0

Hello @alyrazik, could you send me the dataset (if possible) by email? I would really like to replicate this error but it has never happened with my data. So I...

Loading unprocessed corpus documents with CTM and Optimizer

Thanks for open the specific issue, because I had lost the question. Yes, I confirm that there's currently no way to load the unpreprocessed corpus. As mentioned before, this would...

Loading unprocessed corpus documents with CTM and Optimizer

Thanks Roberta! :) yes, that is correct. My suggestion is to first try hyperparameter configurations that "usually" work well. You can find some reference values in these papers: - https://aclanthology.org/2021.acl-short.96/...

ETM model corpus size

Hello, how many words does the larger vocabulary contain approximately? We integrated ETM in OCTIS but we kept the original implementation, which is not optimized for large corpora. My suggestion...

Improve Preprocessing Speed

Hi! Lemmatization is definitely the biggest bottleneck for preprocessing. I didn't know Spacy pipes. It seems the right solution for us, since we already rely on Spacy for the lemmatization....

Improve Preprocessing Speed

Thank you! Let me know if you have any questions. Silvia