NetTaxo
NetTaxo copied to clipboard
Code for the paper "NetTaxo: Automated Topic Taxonomy Constructionfrom Text-Rich Network"
NetTaxo
NetTaxo: Automated Topic Taxonomy Construction from Text-Rich Network
Run the Experiment
Requirements
python>=3.7
spherecluster
scikit-learn<=0.22
joblib
numba
pydot
python-igraph
scipy
tqdm
Run
make
python src/build_taxonomy.py --data_dir data/dblp-5area
Output will be saved to --output_dir
. A taxonomy visualization, a taxonomy dump gz file, and the taxonomy nodes will be saved. Each folder represents a taxonomy node, with the term score distribution and document score distribution saved into two files.
Data
Download and unzip the data into /data
.
Please refer to data/dblp-5area
for data formats.
For use on custom datasets, format the data according to the example dataset.
Motif matching requires additional coding, as motif patterns might be different from dataset to dataset.
Refer to src/motif_embed.py
for motif matcher examples.
Write custom motif matchers, then include them in the main file src/build_taxonomy.py
.