cogdata icon indicating copy to clipboard operation
cogdata copied to clipboard

A light-weight data management system for large-scale pretraining

Results 2 cogdata issues
Sort by recently updated
recently updated
newest added

when I run `cogdata process --task_id test_task --nproc 2 --dataloader_num_workers 1 --ratio 1 test_ds` in the terminal, I got a different result from the guide. ![image](https://github.com/Sleepychord/cogdata/assets/77435739/fea8ceec-3a04-4ad1-9ae4-add46a5df1ba) And when I try...

[background] I want to use my own text-image datasets to generate binary format dataset for CogView training in 'https://github.com/THUDM/CogView'. It has been mentioned in that repo the author use this...