taxogen icon indicating copy to clipboard operation
taxogen copied to clipboard

the score of clus_center is smaller than filter_thre, so it doesn't occur in keywords.txt, leading to KeyError in embs[cate_ph]

Open chanwing opened this issue 4 years ago • 4 comments

Dear authors of TaxoGen, The keyword score of clus_center is smaller than filter_thre, so it (the clus_center) doesn't occur in keywords.txt, leading to KeyError in embs[cate_ph]. How to solve this problem? 作者们您们好, 聚类中心的词的分数小于阈值,因此该词不出现在keywords.txt,导致embs词典里没有这个词,embs[cate_ph]时查不到该词,出现KeyError,请问如何解决?

chanwing avatar May 10 '20 12:05 chanwing

Traceback (most recent call last): File "main.py", line 140, in main(opt) File "main.py", line 117, in main recur(input_dir, root_dir, n_cluster, '*', n_cluster_iter, filter_thre, n_expand, level, True, True) File "main.py", line 103, in recur filter_thre, n_expand, level + 1, caseolap, local_embedding) File "main.py", line 103, in recur filter_thre, n_expand, level + 1, caseolap, local_embedding) File "main.py", line 97, in recur main_local_embedding(node_dir, df.doc_file, df.index_file, parent, n_expand) File "/Users/blabla/PycharmProjects/taxogenpy3/code/local_embedding_training.py", line 147, in main_local_embedding cates = relevant_phs(embs, cates, int(N)) File "/Users/blabla/PycharmProjects/taxogenpy3/code/local_embedding_training.py", line 62, in relevant_phs sim = utils.cossim(embs[cate_ph], embs[ph]) KeyError: 'determining'

chanwing avatar May 10 '20 12:05 chanwing

I changed the filter_thre parameter in the params.py file to 0.25 and was able to complete the run. However, the results are not nearly as good as in the paper

SasCezar avatar Dec 14 '21 08:12 SasCezar

Have you solved it? I have a similar problem.

DecideToLeave avatar Jul 20 '22 02:07 DecideToLeave

Have you solved it? I have a similar problem.

kekexii avatar Nov 27 '23 12:11 kekexii