interpretable-han-for-document-classification-with-keras
interpretable-han-for-document-classification-with-keras copied to clipboard
Keras implementation of hierarchical attention network for document classification with options to predict and present attention weights on both word and sentence level.
Interpretable-han-for-document-classfication-with-keras
This repository uses Keras to implement the hierachical attention network presented in Hierarchical Attention Networks for Document Classification. link
How to use the package
- Clone the repository.
- In the root of repo, run the
python setup.py installto install all packages required. - Import and initialize the class:
from han.model import HAN
han = HAN(embedding_matrix)
You would like to change value of parameters during the initialization, for instance:
han = HAN(embedding_matrix, max_sent_length=150, max_sent_num=15)
- When you initialize the
HAN, the models are also set, so you could print the summary to check layers:
han.print_summary()
- Train the model simply with:
han.train_model(checkpoint_path, X_train, y_train, X_test, y_test)
And you could also tune the value of parameters.
- Show the attention weights for word level:
han.show_word_attention(X)
X is the embedded matrix vector for one review.
Show the attention weights for sentence level:
han.show_sent_attention(X)
X is the embedded matrix vector for reviews (could be multiple reviews).
- Truncate attention weights based on sentence length and number, and transform them into dataframe to make the result easily understandable:
Regarding the word attention, running the line below will give you:
han.word_att_to_df(sent_tokenized_review, word_att)
result will look like:
| word_att | review |
|---|---|
| {'i':0.3, 'am': 0.1, 'wrong': 0.6} | i am wrong |
| {'this': 0.1, 'is': 0.1, 'ridiculously': 0.4, 'good': 0.4} | this is ridiculously good |
han.sent_att_to_df(sent_tokenized_reviews, sent_att)
result will look like:
| sent_att | reviews |
|---|---|
| {'this is good': 0.8, 'i am watching': 0.2} | this is good. i am watching. |
| {'i like it': 0.6, 'it is about history': 0.4} | i like it. it is about history. |