Generating Wikipedia By Summarizing Long Sequences

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1801.10198 published as a conference paper at ICLR 2018

Abstract

This model uses decoder-only architecture which is modified from Transformer
Evaluation uses perplexity, ROUGE, human evaluations

1. Introduction

end-to-end models needs a lot of data
uses topic, non-wiki references as input to generate article text (output)

2. Related Work

2.1. Other datasets used in neural abstractive summarization

Benefit of ROUGE-1 recall score
- proportion of unigram/words in the output co-occuring in the input

2.2. Task involving wikipedia

This paper generates article only with refereced document(wiki document)

3. English wikipedia as a multi-document summarization dataset

D: Document
C_i: Document from Citation

Data Augmentation from this paper

Search Google with section title
Collect 10 page results except for wiki page of document itself
Remove clone
S_i (Search Result) for D

4. Methods and models

Select input subset
train abstractive model

4.1. Extractive stage

tf-idf was best from extractive stage (See Table 3.)

4.2. Abstractive stage

4.2.1. Data representation

uses sub-word to tokenize (Wu et al., 2016)
L is length for truncate (range 100 to 11000, 500 is medium, 11000 is long)

4.2.2. Baseline models

T-ED for typical Transformer encoder-decoder

4.2.3. Transformer Decoder (T-D)

T-D for better performanced baseline model, just remember the formula below

4.2.4. Transformer decoder with memory-compressed attention (T-DMCA)

T-DMCA is the final model from this paper

Local attention

Adopts block concepts that is simillar to DiSAN/BiBloSAN

Memory-compressed attention

Uses convolution to compress attention layers
This allows model to train 3 times longer length than T-D

They use LMLML architecture (L for Local attention, M for Memory-compressed attention)

5. Experiments

5.1. Evaluation

ROUGE metrics doesn't always yield same evaluation from human's judgement (Paulus et al., 2017)
ROUGE-F1: Harmonic mean of ROUGE-Recall, ROUGE-Precision
(From 5.3.) Human evaluation uses DUC-style.

Feb 18 '18 11:02 flrngel

understanding-ai understanding-ai copied to clipboard

Generating Wikipedia By Summarizing Long Sequences

Abstract

1. Introduction

2. Related Work

2.1. Other datasets used in neural abstractive summarization

2.2. Task involving wikipedia

3. English wikipedia as a multi-document summarization dataset

4. Methods and models

4.1. Extractive stage

4.2. Abstractive stage

4.2.1. Data representation

4.2.2. Baseline models

4.2.3. Transformer Decoder (T-D)

4.2.4. Transformer decoder with memory-compressed attention (T-DMCA)

5. Experiments

5.1. Evaluation

understanding-ai
understanding-ai copied to clipboard