BTM-Java
BTM-Java copied to clipboard
Maybe ,I found a bug
if a document in the docs contains only one word,then after running the program ,the result for it in "model-final.theta" will be all zero
@ShanceWang BTM trains topics on word co-occurrence. Documents are treated as a mixture of co-occurred word-pairs. So the document is meaningless when it only contains one word and doesn't have any word-pairs.
Thanks for your reply. uh, another problem. Sometimes the space would be recognized as a word,so it appers in the wordmap with a label.(I'm sure the pre-process for the doc is good) I've observed that,maybe,it's caused by some documents,which contains only two same word as "day day"or "danger danger". so it can be explained as the same reason ? Thanks again for your kind help!
Maybe there is some empty line in the dataset or it is just a special character that looks like a space.