ETM Confuse about the data loader function

Confuse about the data loader function

Open A11en0 opened this issue 3 years ago • 6 comments

Hi, thanks for your wonderful job. But I encounter confusion about the data loader function. Detail as below:

parser.add_argument('--data_path', type=str, default='data/20ng', help='directory containing data')

I can't find any code that refers to the '--data_path' parameter, so why do we need to add it as input in the following command.

python main.py --mode train --dataset 20ng --data_path data/20ng --num_topics 50 --train_embeddings 1 --epochs 1000

How do these two parameters doc_terms_file_name and terms_filename do? I don't understand, even I can't find 'tf_idf_doc_terms_matrix_time_window_1' anywhere (such as the provided dataset directory.)

vocab, training_set, valid, test_1, test_2 = data.get_data(doc_terms_file_name="tf_idf_doc_terms_matrix_time_window_1",
                                                           terms_filename="tf_idf_terms_time_window_1")

Dec 28 '21 06:12 A11en0

same question...

Feb 28 '22 07:02 liuh236

me too, also encounter this problem...

Mar 24 '22 11:03 lxkkk117

For the second question, you can find it in file data_espy_tweets.py savemat(path_save.joinpath('tf_idf_doc_terms_matrix_time_window_1'), {"doc_terms_matrix": doc_terms_matrix}) savemat(path_save.joinpath('tf_idf_terms_time_window_1'), {"terms" : terms})

Mar 26 '22 07:03 zhaoLLL

I have the same problem.

@zhaoLLL thanks for your reply but how do the bow_X_tokens.mat and bow_X_counts.mat map to these two TF-IDF matrices?

May 03 '22 08:05 manueltonneau

Since this repo doesn't seem to be curated anymore, I suggest you use another repo I just discovered: https://github.com/lffloyd/embedded-topic-model I was able to use ETM very easily with it.

May 03 '22 13:05 manueltonneau

same question！

Mar 01 '23 07:03 Littleele

ETM ETM copied to clipboard

Confuse about the data loader function

ETM
ETM copied to clipboard