composer icon indicating copy to clipboard operation
composer copied to clipboard

Remove c4 dataset

Open mvpatel2000 opened this issue 2 years ago • 1 comments

What does this PR do?

Removes C4 dataset. This is currently broken with datasets upgrade. We recommend using streaming datasets anyways, so we're just going to get rid of datasets dependency and the C4Dataset in Composer.

What issue(s) does this change relate to?

CO-1271

mvpatel2000 avatar Oct 16 '22 06:10 mvpatel2000

@abhi-mosaic can you please verify dataloader changes to the gpt yamls

mvpatel2000 avatar Oct 16 '22 19:10 mvpatel2000

Unit tests verify it parses correctly. I'm skipping tests checking that it actually runs since cluster is full, and these example YAMLs are all for a deprecated codepath for which we will be dropping support after the next release.

mvpatel2000 avatar Oct 20 '22 00:10 mvpatel2000

Ah nevermind I just tested it because I felt bad. It works. woohoo.

mvpatel2000 avatar Oct 20 '22 04:10 mvpatel2000