understanding-ai icon indicating copy to clipboard operation
understanding-ai copied to clipboard

Generating Wikipedia By Summarizing Long Sequences

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1801.10198 published as a conference paper at ICLR 2018

Abstract

  • This model uses decoder-only architecture which is modified from Transformer
  • Evaluation uses perplexity, ROUGE, human evaluations

1. Introduction

  • end-to-end models needs a lot of data
  • uses topic, non-wiki references as input to generate article text (output)

2. Related Work

2.1. Other datasets used in neural abstractive summarization

  • Benefit of ROUGE-1 recall score
    • proportion of unigram/words in the output co-occuring in the input

2.2. Task involving wikipedia

  • This paper generates article only with refereced document(wiki document)

3. English wikipedia as a multi-document summarization dataset

  • D: Document
  • C_i: Document from Citation

Data Augmentation from this paper

  1. Search Google with section title
  2. Collect 10 page results except for wiki page of document itself
  3. Remove clone
  4. S_i (Search Result) for D

4. Methods and models

  1. Select input subset
  2. train abstractive model

4.1. Extractive stage

tf-idf was best from extractive stage (See Table 3.)

4.2. Abstractive stage

4.2.1. Data representation

  • uses sub-word to tokenize (Wu et al., 2016)
  • L is length for truncate (range 100 to 11000, 500 is medium, 11000 is long)

4.2.2. Baseline models

T-ED for typical Transformer encoder-decoder

4.2.3. Transformer Decoder (T-D)

T-D for better performanced baseline model, just remember the formula below image image

4.2.4. Transformer decoder with memory-compressed attention (T-DMCA)

T-DMCA is the final model from this paper

Local attention

  • Adopts block concepts that is simillar to DiSAN/BiBloSAN

Memory-compressed attention

  • Uses convolution to compress attention layers
  • This allows model to train 3 times longer length than T-D

They use LMLML architecture (L for Local attention, M for Memory-compressed attention) image

5. Experiments

5.1. Evaluation

  • ROUGE metrics doesn't always yield same evaluation from human's judgement (Paulus et al., 2017)
  • ROUGE-F1: Harmonic mean of ROUGE-Recall, ROUGE-Precision
  • (From 5.3.) Human evaluation uses DUC-style.

flrngel avatar Feb 18 '18 11:02 flrngel