Results 105 comments of Minchul Lee

@mrchypark 모델 크기 차이 대비 성능 차이가 그렇게 크지도 않고, 다른 모델 사이즈를 이용하는 경우도 적은거 같아, 다른 사이즈를 튜닝하는 데 들이는 리소스 대비 아웃풋이 적은 듯하여 고민 중에 있습니다....

해당 기능은 0.11.0버전에서 https://github.com/bab2min/Kiwi/blob/main/tools/model_builder.cpp 에 일부 구현되었습니다. 추후 모델 학습 기능은 완전히 API에 통합시켜 더욱 간단하게 커스텀 형태소 분석 모델을 구축할 수 있게 지원할 예정입니다.

Hi @bertomartin As you know, currently tomotopy has no feature about removing dead topics from HDP models. This is because dead and live topics can be swapped out during training,...

Blueprint of `purge_dead_topics` method of `tomotopy.HDPModel`: ```python model = tp.HDPModel(...) ... model.train(...) # model may have a lot of dead topics at this point, e.g. # 0: live topic #...

@bertomartin You can filter out dead topics using numpy indexing like: ```python live_topics = [k for k in range(mdl.k) if mdl.is_live_topic(k)] # topics you want to visualize topic_term_dists = np.stack([mdl.get_topic_word_dist(k)...

Hi @juneMJ The coherence measures actually are defined like below: https://github.com/bab2min/tomotopy/blob/d30964ce0610a5e34d3645cfc8c26d99536cac03/tomotopy/coherence.py#L62-L67 The second value is the default size of sliding windows. If you don't provide the `window_size` argument for `coherence.Coherence()`,...

Hi, @erip Could you share some pieces of the file `10_line_pretokenized_corpus.tsv` for reproducing? A similar error is not reproduced in the sample text I have, so it is not easy...

Ooops sorry @erip , I forgot this feed entirely. Yes, I used `WSTok` and it worked well. Since I don't have `tm_model.bin` and `10_line_pretokenized_corpus.tsv`, I ran the code, which is...

If you have a question about pass and iteration in gensim, this [link](https://groups.google.com/g/gensim/c/on1PMjAvdr8?pli=1) might be helpful. It should be noted that gensim's LdaModel uses **Variational Bayes**, but tomotopy's LDAModel uses...

There is no official file extension for topic model in `tomotopy`. Do we need a file extension? Which one would you like if it is necessary?