text icon indicating copy to clipboard operation
text copied to clipboard

Add T5 Model and Demo on Text Summarization using CNNDM Dataset

Open pmabbo13 opened this issue 3 years ago • 0 comments

🚀 Feature

Add CNNDM dataset and a pre-trained T5 model to TorchText. Demo model on task of abstractive summarization using the CNNDM dataset.

Motivation

There are multiple frameworks out in OSS that cater to a wide variety of audiences. As a result of this fragmentation, a typical NLP researcher usually writes their code in pure PyTorch while copying essential components from other repositories. Adding a pre-trained T5 model and CNNDM dataset increases the convenience of using the TorchText library and works towards making PyTorch the most preferred deep learning framework for NLP research.

T5 (Text-To-Text Transfer Transformer) is a transformer model that is trained in an end-to-end manner with text as input and modified text as output. This text-to-text formatting makes the T5 model fit for multiple NLP tasks like Summarization, Question-Answering, Machine Translation, and Classification problems. CNNDM (CNN/DailyMail) is also a popular dataset in the NLP community used for text summarization tasks.

Pitch

The T5 model architecture will be implemented to allow for initialization using hyper-parameters such as the number of layers, hidden size, attention size, etc. The user should also be able to specify whether they wish to access the Encoder-only model (for non-text generation tasks) or the Encoder-Decoder model. To load the pre-trained weights, Google has released 5 checkpoints for the different sized T5 models, so these checkpoint weights will be added to PyTorch.org and an API will be implemented to load these checkpoints. Finally, integration tests will be added for both the Encoder-only and Encoder-Decoder model APIs.

The CNNDM dataset will also be made available in the TorchText library. This will allows us to demo the pre-trained T5 model by using it to perform abstract summarization on the CNNDM dataset. A text pre-processing pipeline will need to be implemented in order to prep the data for the model.

Milestone 1: Add CNNDM dataset

  • [x] Add CNNDM dataset
    • [x] #1789
    • [x] #1809

Milestone 2: Implement T5 model architecture

  • [x] Create relative position buckets method #1830
  • [x] Create method to compute relative attention bias term #1831
  • [x] Create method to compute attention scores using relative attention bias #1832
  • [x] Implement MultiheadAttention module
    • [x] #1825
    • [x] #1833
  • [x] Implement Root-Mean-Square Layer normalization module #1826
  • [x] Implement T5Layer module #1827
  • [x] Implement T5Stack module #1828
  • [x] Implement T5Model module #1829
  • [x] Add pre-trained T5 model weights and an API to load them #1846
  • [x] Create integration tests for the T5 model APIs #1848

Milestone 3: Demo T5 model on text summarization

  • [x] Create text pre-processing pipeline to prep data for T5 model #1852
  • [x] Expose text pre-processing pipeline in T5 Bundler #1856
  • [x] Demonstrate text generation using T5 on the CNNDM dataset
    • [x] #1862
    • [x] #1864

Stretch Goals

  • [x] Implement a beam search generator #1869
  • [x] Demo T5 model on additional tasks #1872
  • [x] Make the model torchscriptable #1876
  • [ ] Add remaining model configs (i.e. small, large, etc..) #1879

pmabbo13 avatar Jun 21 '22 20:06 pmabbo13