understanding-ai
understanding-ai copied to clipboard
Generating Wikipedia By Summarizing Long Sequences
https://arxiv.org/abs/1801.10198 published as a conference paper at ICLR 2018
Abstract
- This model uses decoder-only architecture which is modified from Transformer
- Evaluation uses perplexity, ROUGE, human evaluations
1. Introduction
- end-to-end models needs a lot of data
- uses topic, non-wiki references as input to generate article text (output)
2. Related Work
2.1. Other datasets used in neural abstractive summarization
- Benefit of ROUGE-1 recall score
- proportion of unigram/words in the output co-occuring in the input
2.2. Task involving wikipedia
- This paper generates article only with refereced document(wiki document)
3. English wikipedia as a multi-document summarization dataset
- D: Document
- C_i: Document from Citation
Data Augmentation from this paper
- Search Google with section title
- Collect 10 page results except for wiki page of document itself
- Remove clone
- S_i (Search Result) for D
4. Methods and models
- Select input subset
- train abstractive model
4.1. Extractive stage
tf-idf was best from extractive stage (See Table 3.)
4.2. Abstractive stage
4.2.1. Data representation
- uses sub-word to tokenize (Wu et al., 2016)
- L is length for truncate (range 100 to 11000, 500 is medium, 11000 is long)
4.2.2. Baseline models
T-ED for typical Transformer encoder-decoder
4.2.3. Transformer Decoder (T-D)
T-D for better performanced baseline model, just remember the formula below
4.2.4. Transformer decoder with memory-compressed attention (T-DMCA)
T-DMCA is the final model from this paper
Local attention
- Adopts block concepts that is simillar to DiSAN/BiBloSAN
Memory-compressed attention
- Uses convolution to compress attention layers
- This allows model to train 3 times longer length than T-D
They use LMLML architecture (L for Local attention, M for Memory-compressed attention)
5. Experiments
5.1. Evaluation
- ROUGE metrics doesn't always yield same evaluation from human's judgement (Paulus et al., 2017)
- ROUGE-F1: Harmonic mean of ROUGE-Recall, ROUGE-Precision
- (From 5.3.) Human evaluation uses DUC-style.