long-summarization
long-summarization copied to clipboard
Non tokenized and cased dataset
Where can i get the non tokenized and cased Arxiv and Pubmed dataset?
The datasets include paper ids, if you'd like the raw data it should be easy to fetch those from pubmed open access and arxiv. If you ended up doing that, a PR on adding a link to the collected raw data is very welcome.