longbart
longbart copied to clipboard
High-level understanding of code
Hi @patil-suraj, as you make a README/summary of the codebase, I'd like to provide my understanding of the code to perhaps give some insight into how a "new pair of eyes" is understanding your codebase.
From a high-level, it seems like longbart is doing a few things architecturally:
- Reusing the high-level encoder-decoder architecture of BART via
BartForConditionalGeneration
- Replacing BART's encoder attention layers with the
LongformerSelfAttentionForBart
- Increasing the
attention_window
to1024
- Increasing
max_pos
(positional embeddings) to4096
From there, in order to use longbart for new long-form, abstractive text summarization, one would need to pre-train longbart on a new dataset (is this accurate)?
New dataset examples suggestions are PubMed and BigPatent from here: https://github.com/allenai/longformer/issues/28#issuecomment-638541231
Are there any things I am missing? The "Replacing BART's encoder attention layers" feels like it is the core implementation update.