train-CLIP Dataset structure

Hi I'm having a little trouble understanding the dataset structure that I should follow in order to be able to train with this package. Is it one parent folder, one folder containing images and one folder containing their text files? If yes, what should these subfolders be named?

Sep 07 '21 11:09 tarunn2799

https://github.com/Zasder3/train-CLIP#training-with-our-datamodule- any folder name should work, the file names should be the same

Sep 07 '21 12:09 rom1504

Hey, so all images and text files should be in one single folder?

Sep 09 '21 06:09 tarunn2799

No, any subfolder

Sep 09 '21 08:09 rom1504

Does this work data/images/p1.jpg and data/text/p1.txt

Sep 09 '21 16:09 tarunn2799

Yes

On Thu, Sep 9, 2021, 18:09 Tarun Narayanan @.***> wrote:

Does this work data/images/p1.jpg and data/text/p1.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Zasder3/train-CLIP/issues/19#issuecomment-916238043, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437SLKJEZ3Z2UMH5FAITUBDL2JANCNFSM5DSHHO4A .

Sep 09 '21 16:09 rom1504

Hi I prepared my dataset in that structure and I ran the below command python train.py --model_name RN50 --folder /data/depop/data_org/clip/data/ --batch_size 512 --gpus 1

I'm getting an AssertionError from the cosine_annealing_warmup package for the line assert warmup_steps < first_cycle_steps

What's happening here? please help me out

Sep 13 '21 07:09 tarunn2799

Okay so in models/wrapper.py is the warmup_step hardcoded to 2000? My dataset currently is much smaller for the num_training_steps to be bigger than 2000.

Sep 13 '21 08:09 tarunn2799

Hi, the .txt file here contains the a text caption? Lets say I have to create my pair of image and text caption, could you please tell me if assumption below is correct?

so if I have to Finetune the CLIP model on pair of images and captions then this would work?

data/images/1_german_sheperd.jpg
data/label/1_german_sheperd.txt
data/images/2_german_sheperd.jpg
data/label/2_german_sheperd.txt

where,

1_german_sheperd.txt contains a caption like "A sleeping German shepherd Dog"
2_german_sheperd.txt contains a caption like "An angry barking German shepherd Dog"

Oct 05 '21 12:10 singularity014

yes I'm surprised how much this is confusing people

Oct 05 '21 22:10 rom1504

yes I'm surprised how much this is confusing people

Actually, creating a file per caption(or label) , didn't make much sense to me, hence the question.

Oct 06 '21 03:10 singularity014

@tarunn2799 Hi，I would like to know has this problem been solved.

Okay so in models/wrapper.py is the warmup_step hardcoded to 2000? My dataset currently is much smaller for the num_training_steps to be bigger than 2000.

Thanks for your time.

Nov 02 '21 02:11 bk-201jk

@tarunn2799 Hi，I would like to know has this problem been solved.

Okay so in models/wrapper.py is the warmup_step hardcoded to 2000? My dataset currently is much smaller for the num_training_steps to be bigger than 2000.

Thanks for your time.

Hi @bk-201jk, I faced the same issue and solved the issue thanks to @ymzhu19eee in the issue #20

Nov 23 '21 10:11 iremonur

@iremonur Thank you very much！And I want to know how many photo in your dataset. And how do you set up your directory structure? What is in txt, or are its contents in the title. I would appreciate it if I could see a set of data in your dataset!!

Nov 23 '21 10:11 bk-201jk

I'm planning to prepare a 100k dataset (image-text pairs) for fine-tuning, but first I wanted to see if the code would work by running it with only 3 image-text pairs. The folder structure is as follows: train-CLIP/data/img/1.png train-CLIP/data/caption/1.txt And one of the texts: There is a car on the road.

Nov 24 '21 08:11 iremonur

@iremonur .Thank you very much. If you can run the code with only 3 image-text pairs, please tell me .Thanks again!!

Nov 24 '21 08:11 bk-201jk

train-CLIP train-CLIP copied to clipboard

Dataset structure

train-CLIP
train-CLIP copied to clipboard