SUMO icon indicating copy to clipboard operation
SUMO copied to clipboard

Code for paper Single Document Summarization as Tree Induction

SUMO

This code is for paper Single Document Summarization as Tree Induction

Python version: This code is in Python3.6

Package Requirements: pytorch tensorboardX pyrouge

Some codes are borrowed from ONMT(https://github.com/OpenNMT/OpenNMT-py)

Data Preparation:

Download the processed data for CNN/Dailymail

download https://drive.google.com/open?id=1BM9wvnyXx9JvgW2um0Fk9bgQRrx03Tol

unzip the zipfile and copy to data/

Model Training

python train.py -mode train -onmt_path ../data/cnndm_data/cnndm -batch_size 50000 -visible_gpu 1 -report
_every 100 -optim adam -lr 1  -save_checkpoint_steps 1000 -train_steps 150000 -model_path ../models/str_l5_i3 -log_file
../logs/str_l5_i3 -local_layers 5 -inter_layers 3 -dropout 0.1 -emb_size 128 -hidden_size 128 -heads 4 -ff_size 512 -dec
ay_method noam -warmup_steps 8000 -structured
  • -mode can be {train, validate, test}, where validate will inspect the model directory and evaluate the model for each newly saved checkpoint, test need to be used with -test_from, indicating the checkpoint you want to use

Model Evaluation

After the training finished, run

python train.py -mode validate -onmt_path ../data/cnndm_data/cnndm -batch_size 50000 -visible_gpu 1 -report
_every 100 -optim adam -lr 1  -save_checkpoint_steps 1000 -train_steps 150000 -model_path ../models/str_l5_i3 -log_file
../logs/str_l5_i3 -local_layers 5 -inter_layers 3 -dropout 0.1 -emb_size 128 -hidden_size 128 -heads 4 -ff_size 512 -dec
ay_method noam -warmup_steps 8000 -structured -test_all