the-pile
the-pile copied to clipboard
Question regarding Shuffling
Hi, thank you very much for releasing this great dataset. I am wondering if the original PILE dataset (with 30 chunks) have already shuffled? Or do we still need to globally shuffle PILE before using it for pertaining? Thank you.
Hi, @LeoXinhaoLee I am also curious about it. Are there any conclusions?