HyperIM icon indicating copy to clipboard operation
HyperIM copied to clipboard

vocab_size and word_num's difference?

Open Clarioooo opened this issue 8 months ago • 0 comments

x_train.txt should be (instance_num, word_num)

I thought 'word_num' is the number of words which appearance in the RCV1. But there is another parameter 'vocab_size'. And every document have different words. How do I get x_train with the same 'word_num'?

I don't know if this is correct that x_train.txt should be obtained by calculating the frequency of words in each instance. If the 20,000th word in the vocabulary appears 10 times, then 10 should be filled in the corresponding position.

Clarioooo avatar Jun 23 '24 15:06 Clarioooo