composer
composer copied to clipboard
Remove c4 dataset
What does this PR do?
Removes C4 dataset. This is currently broken with datasets
upgrade. We recommend using streaming datasets anyways, so we're just going to get rid of datasets
dependency and the C4Dataset
in Composer.
What issue(s) does this change relate to?
@abhi-mosaic can you please verify dataloader changes to the gpt yamls
Unit tests verify it parses correctly. I'm skipping tests checking that it actually runs since cluster is full, and these example YAMLs are all for a deprecated codepath for which we will be dropping support after the next release.
Ah nevermind I just tested it because I felt bad. It works. woohoo.