HyperIM
HyperIM copied to clipboard
PyTorch implementation of the paper "Hyperbolic Interaction Model For Hierarchical Multi-Label Classification"
HyperIM
PyTorch implementation of Hyperbolic Interaction Model For Hierarchical Multi-Label Classification
Requirements
- torch>=1.0.0
- geoopt (
$ pip install git+https://github.com/geoopt/geoopt.git
) - numpy
- scipy
- pandas
- tqdm
Instruction
Run HyperIM via
$ python HyperIM.py
or run EuclideanIM via
$ python EuclideanIM.py
Alternatively use the two Jupyter notebooks.
Data
X_train and X_test should be dense numpy array with shape (instance_num, word_num), y_train and y_test should be one-hot sparse scipy array with shape (instance_num, label_num). Sample data is provided in ./data/sample/
.
The multi-label text classification datasets equipped with hierarchically structured labels (RCV1, Zhihu and WikiLSHTC) are publicly available.
Pre-trained embeddings
Hyperbolic word embeddings can be trained following Poincaré GloVe. Pre-trained word embeddings should have shape (vocab_size, embed_dim).
The label hierarchy can be embedded using the gensim implementation of the Poincaré embeddings, specified in Train and use Poincaré embeddings. Label embeddings should have shape (label_num, embed_dim). Note that the index of labels in the label embeddings should be consistent with y_train and y_test.
Use them accordingly in HyperIM.py
and EuclideanIM.py
.
Citation
If you find this code useful for your research, please cite the following paper in your publication:
@article{chen2019hyperbolic,
title={Hyperbolic Interaction Model For Hierarchical Multi-Label Classification},
author={Chen, Boli and Huang, Xin and Xiao, Lin and Cai, Zixin and Jing, Liping },
journal={arXiv preprint arXiv:1905.10802},
year={2019}
}