DiscoBERT
DiscoBERT copied to clipboard
Long documents
Would it be possible to summarize documents with length > 758 tokens? Using https://github.com/allenai/longformer could be interesting for that use-case.
Hi! Thanks for your suggestion. Longformer is great for the long document scenario. In this project, I can actually change the max_len to any, see https://github.com/jiacheng-xu/DiscoBERT/blob/a96922a2cfd4b14d48a38d943529f8c035b43d84/data_preparation/data_structure.py#L20 What I do is randomly initializing the extended part and fine-tuning on the downstream tasks. You can also change the num 768 to any number you want.