longbart icon indicating copy to clipboard operation
longbart copied to clipboard

High-level understanding of code

Open virattt opened this issue 5 years ago • 0 comments

Hi @patil-suraj, as you make a README/summary of the codebase, I'd like to provide my understanding of the code to perhaps give some insight into how a "new pair of eyes" is understanding your codebase.

From a high-level, it seems like longbart is doing a few things architecturally:

  • Reusing the high-level encoder-decoder architecture of BART via BartForConditionalGeneration
  • Replacing BART's encoder attention layers with the LongformerSelfAttentionForBart
  • Increasing the attention_window to 1024
  • Increasing max_pos (positional embeddings) to 4096

From there, in order to use longbart for new long-form, abstractive text summarization, one would need to pre-train longbart on a new dataset (is this accurate)?

New dataset examples suggestions are PubMed and BigPatent from here: https://github.com/allenai/longformer/issues/28#issuecomment-638541231

Are there any things I am missing? The "Replacing BART's encoder attention layers" feels like it is the core implementation update.

virattt avatar Jun 12 '20 18:06 virattt