hdp
hdp copied to clipboard
Final results aren't saved in `--directory` when `--sample_hyper` is "yes" with some `--random_seed` values and datasets
I encountered that with some datasets, the final results of the training phase aren't stored under --directory
if I use a random_seed
of 13712
while hyper-sampling the concentration parameters as well. Only the file state.log
would be produced, but not any other output files.
To reproduce the problem:
- Download this training corpus from PAN @ CLEF 2017 competition
- Run the regular hdp (not the fast variant) on the LDA-C corpus of the fifth training problem set, like:
hdp.exe --data ..\pan17_train\problem005\ldac_corpus.dat --algorithm train --directory ..\output --sample_hyper yes --save_lag -1 --random_seed 13712
(I used gensim
to generate the LDA-C corpora)
The program will run smoothly and no error would be raised. However, the output directory would contain only the state.log
file and the interim outputs, where we expect also mode.bin
, mode-topics.dat
and mode-word-assignments.dat
. As far as I can tell, the combination of --sample_hyper yes
and --random_seed 13712
is causing this fault to occur on selected datasets.