the-pile
the-pile copied to clipboard
"Github" code data download only
The size of pile is too big for me. I just want to download the "Github" code data. But the number of Pile train file is 30. I would like to know exactly which file contains the "Github" code data.
The data is already processed by that stage, and may not be what you want. You probably want the github.tar
from the preliminary components https://the-eye.eu/public/AI/pile_preliminary_components/github.tar and process it yourself.
The link is no longer working, is there another link to obtain the data?