ETM
ETM copied to clipboard
Confuse about the data loader function
Hi, thanks for your wonderful job. But I encounter confusion about the data loader function. Detail as below:
parser.add_argument('--data_path', type=str, default='data/20ng', help='directory containing data')
- I can't find any code that refers to the '--data_path' parameter, so why do we need to add it as input in the following command.
python main.py --mode train --dataset 20ng --data_path data/20ng --num_topics 50 --train_embeddings 1 --epochs 1000
- How do these two parameters doc_terms_file_name and terms_filename do? I don't understand, even I can't find 'tf_idf_doc_terms_matrix_time_window_1' anywhere (such as the provided dataset directory.)
vocab, training_set, valid, test_1, test_2 = data.get_data(doc_terms_file_name="tf_idf_doc_terms_matrix_time_window_1",
terms_filename="tf_idf_terms_time_window_1")
same question...
me too, also encounter this problem...
For the second question, you can find it in file data_espy_tweets.py savemat(path_save.joinpath('tf_idf_doc_terms_matrix_time_window_1'), {"doc_terms_matrix": doc_terms_matrix}) savemat(path_save.joinpath('tf_idf_terms_time_window_1'), {"terms" : terms})
I have the same problem.
@zhaoLLL thanks for your reply but how do the bow_X_tokens.mat
and bow_X_counts.mat
map to these two TF-IDF matrices?
Since this repo doesn't seem to be curated anymore, I suggest you use another repo I just discovered: https://github.com/lffloyd/embedded-topic-model I was able to use ETM very easily with it.
same question!