BTM-Java icon indicating copy to clipboard operation
BTM-Java copied to clipboard

Maybe ,I found a bug

Open ShanceWang opened this issue 8 years ago • 3 comments

if a document in the docs contains only one word,then after running the program ,the result for it in "model-final.theta" will be all zero

ShanceWang avatar Jul 22 '16 06:07 ShanceWang

@ShanceWang BTM trains topics on word co-occurrence. Documents are treated as a mixture of co-occurred word-pairs. So the document is meaningless when it only contains one word and doesn't have any word-pairs.

ffftzh avatar Jul 26 '16 07:07 ffftzh

Thanks for your reply. uh, another problem. Sometimes the space would be recognized as a word,so it appers in the wordmap with a label.(I'm sure the pre-process for the doc is good) I've observed that,maybe,it's caused by some documents,which contains only two same word as "day day"or "danger danger". so it can be explained as the same reason ? Thanks again for your kind help!

ShanceWang avatar Jul 26 '16 08:07 ShanceWang

Maybe there is some empty line in the dataset or it is just a special character that looks like a space.

ffftzh avatar Aug 01 '16 11:08 ffftzh