cogdata icon indicating copy to clipboard operation
cogdata copied to clipboard

how to use this data toolkit

Open xiaocaijizzz opened this issue 2 years ago • 2 comments

[background] I want to use my own text-image datasets to generate binary format dataset for CogView training in 'https://github.com/THUDM/CogView'. It has been mentioned in that repo the author use this cogdate toolkit to preprocess data.

[question] Would you please tell me how to organize my raw text-image dataset, and then how to use the cogdata toolkit to generate the target bin file? for example, whether i should name the a text-image pair the same, such as 'a dog sits on the ground.txt' and 'a dog sits on the ground.png', or i should take other forms?

xiaocaijizzz avatar Dec 09 '21 03:12 xiaocaijizzz