data_tooling
data_tooling copied to clipboard
Create license-compliant version of the Pile: subsets
Subsets of The Pile:
- pubmed
- ubuntu_irc
- europarl
- hacker_news
- nih_exporter