HyperIM
HyperIM copied to clipboard
vocab_size and word_num's difference?
x_train.txt should be (instance_num, word_num)
I thought 'word_num' is the number of words which appearance in the RCV1. But there is another parameter 'vocab_size'. And every document have different words. How do I get x_train with the same 'word_num'?
I don't know if this is correct that x_train.txt should be obtained by calculating the frequency of words in each instance. If the 20,000th word in the vocabulary appears 10 times, then 10 should be filled in the corresponding position.