cogdata
cogdata copied to clipboard
how to use this data toolkit
[background] I want to use my own text-image datasets to generate binary format dataset for CogView training in 'https://github.com/THUDM/CogView'. It has been mentioned in that repo the author use this cogdate toolkit to preprocess data.
[question] Would you please tell me how to organize my raw text-image dataset, and then how to use the cogdata toolkit to generate the target bin file? for example, whether i should name the a text-image pair the same, such as 'a dog sits on the ground.txt' and 'a dog sits on the ground.png', or i should take other forms?