DiscoBERT icon indicating copy to clipboard operation
DiscoBERT copied to clipboard

Long documents

Open timsuchanek opened this issue 4 years ago • 1 comments

Would it be possible to summarize documents with length > 758 tokens? Using https://github.com/allenai/longformer could be interesting for that use-case.

timsuchanek avatar Jun 03 '20 08:06 timsuchanek

Hi! Thanks for your suggestion. Longformer is great for the long document scenario. In this project, I can actually change the max_len to any, see https://github.com/jiacheng-xu/DiscoBERT/blob/a96922a2cfd4b14d48a38d943529f8c035b43d84/data_preparation/data_structure.py#L20 What I do is randomly initializing the extended part and fine-tuning on the downstream tasks. You can also change the num 768 to any number you want.

jiacheng-xu avatar Aug 10 '20 04:08 jiacheng-xu