tomotopy icon indicating copy to clipboard operation
tomotopy copied to clipboard

Python package of Tomoto, the Topic Modeling Tool

Results 67 tomotopy issues
Sort by recently updated
recently updated
newest added

```python #데이터 준비, vocab 준비 및 전처리 df = list() with open("result/mecab_lda_corpus.csv", mode="r", encoding="UTF-8") as f: df = f.readlines() df = [i.rstrip() for i in df] df = list(reversed(df)) time_point_list...

안녕하세요. 다른 토픽모델링 라이브러리를 쓰다가 빠르고 사용하기 편해서 넘어온 사용자입니다. 사용하다보니 조금 아쉬운 점이 있어서 건의를 남깁니다. [sklearn의 tfidfvectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html)의 parameter에는 vocabulary가 있어서 불용어나 특정 단어 제외 등의 전처리를 하지 않은...

Analysis [here](https://github.com/bab2min/tomotopy/issues/182#issuecomment-1715862530) suggests that the queue size is too small to be effective for parallel processing. It's a bit hard to follow the code path, but it seems to be...

tomotopy-0.12.5 사용 중이구요... process_corpus = tp.utils.Corpus() load_corpus = tp.utils.Corpus() ..... process_corpus .save('save.corpus') load_corpus.load('save.corpus') ... process_corpus 값은 extract_ngrams 으로 값이 나오지만 load_corpus 에서는 빈값만 나와서요... 혹시나 해서 pickle로 해도 마찬가지여서... pickle.dump(data,...

The problem encountered is the same, occupying 100G of memory, 40 cores are turned on, and reasoning is performed on texts with a length of less than 5000 words, 2...

I have a size 13000 dataset with 20 categories and trained SLDA with those labels with K=16. After training, I first call the infer function, then estimate to predict a...

After I trained my SLDA model with labels, I called SLDA_model.get_regression_coef(). The results contains a L x K matrix, where L is the number of unique labels in my dataset,...

When I train a PLDAModel and then save and load it, after load the model's properties have changed. For instance ```python from tomotopy import PLDAModel docs = [['foo'], ['bar'], ['baz'],...

What is the log_ll returned by the infer method? I thought it was the logarithm of the generating likelihood of the document, no? If not, please tell me an easy...

I used a correlated topic model on a 4,500-document corpus to learn the type and frequency of topics. The results were very good, but unfortunately one of the topics (#14)...