JGibbLabeledLDA icon indicating copy to clipboard operation
JGibbLabeledLDA copied to clipboard

the input file

Open tobebetterinamonth opened this issue 7 years ago • 2 comments

according to README, i copy the example of input file ,however the result is Unknown document label ( label_2,1 ) for document 1. Unknown document label ( label_2,2 ) for document 1. Unknown document label ( label_2,l_2 ) for document 1. when i use only one line into the input,however,it can't figure out the "[",the model just think the charecter and the "[" as one word which means the model think the inputfile as unlabeled.But in the LDAcmdoption,-unlabeled is false.could you please give us a example of inputfile ,so i can know what's wrong with our inputfile?

tobebetterinamonth avatar Feb 18 '18 13:02 tobebetterinamonth

I think you may be using the wrong label format. It should just be space-separated ints, surrounded by square brackets. For example, this document has two words and labels 1 and 3:

[1 3] hello world

myleott avatar Feb 18 '18 17:02 myleott

Hi ! Although I modify as you said, I have the same question! If I want to use this model to train labeled data(topic label), the train data format should be same as the test data format? Every document text has one topic in my data. The format as you said like "[1] hello world". But the "topic-word" distribution appears the label information like following format. And the different topic has the same "topic-word" distribution. What's wrong with my process. topic0 : [0] 3.661662394727206E-5 女性 4.027828634199927E-4 身体 3.661662394727206E-5 结构 0.02164042475283779 雌激素 3.661662394727206E-5 水平 3.661662394727206E-5 高 3.661662394727206E-5 容易 3.661662394727206E-5 甲亢 3.661662394727206E-5 [1] 3.661662394727206E-5 topic1 : [0] 3.699593044765076E-5 女性 0.09585645578986313 身体 3.699593044765076E-5 结构 3.699593044765076E-5 雌激素 3.699593044765076E-5 水平 0.008176100628930818 高 3.699593044765076E-5 容易 3.699593044765076E-5 甲亢 3.699593044765076E-5 [1] 0.0011468738438771735

zy158 avatar Mar 15 '18 04:03 zy158