train-CLIP icon indicating copy to clipboard operation
train-CLIP copied to clipboard

Dataset structure

Open tarunn2799 opened this issue 3 years ago • 15 comments

Hi I'm having a little trouble understanding the dataset structure that I should follow in order to be able to train with this package. Is it one parent folder, one folder containing images and one folder containing their text files? If yes, what should these subfolders be named?

tarunn2799 avatar Sep 07 '21 11:09 tarunn2799

https://github.com/Zasder3/train-CLIP#training-with-our-datamodule- any folder name should work, the file names should be the same

rom1504 avatar Sep 07 '21 12:09 rom1504

Hey, so all images and text files should be in one single folder?

tarunn2799 avatar Sep 09 '21 06:09 tarunn2799

No, any subfolder

rom1504 avatar Sep 09 '21 08:09 rom1504

Does this work data/images/p1.jpg and data/text/p1.txt

tarunn2799 avatar Sep 09 '21 16:09 tarunn2799

Yes

On Thu, Sep 9, 2021, 18:09 Tarun Narayanan @.***> wrote:

Does this work data/images/p1.jpg and data/text/p1.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Zasder3/train-CLIP/issues/19#issuecomment-916238043, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437SLKJEZ3Z2UMH5FAITUBDL2JANCNFSM5DSHHO4A .

rom1504 avatar Sep 09 '21 16:09 rom1504

Hi I prepared my dataset in that structure and I ran the below command python train.py --model_name RN50 --folder /data/depop/data_org/clip/data/ --batch_size 512 --gpus 1

I'm getting an AssertionError from the cosine_annealing_warmup package for the line assert warmup_steps < first_cycle_steps

What's happening here? please help me out

tarunn2799 avatar Sep 13 '21 07:09 tarunn2799

Okay so in models/wrapper.py is the warmup_step hardcoded to 2000? My dataset currently is much smaller for the num_training_steps to be bigger than 2000.

tarunn2799 avatar Sep 13 '21 08:09 tarunn2799

Hi, the .txt file here contains the a text caption? Lets say I have to create my pair of image and text caption, could you please tell me if assumption below is correct?

so if I have to Finetune the CLIP model on pair of images and captions then this would work?

  • data/images/1_german_sheperd.jpg

  • data/label/1_german_sheperd.txt

  • data/images/2_german_sheperd.jpg

  • data/label/2_german_sheperd.txt

where,

  • 1_german_sheperd.txt contains a caption like "A sleeping German shepherd Dog"
  • 2_german_sheperd.txt contains a caption like "An angry barking German shepherd Dog"

singularity014 avatar Oct 05 '21 12:10 singularity014

yes I'm surprised how much this is confusing people

rom1504 avatar Oct 05 '21 22:10 rom1504

yes I'm surprised how much this is confusing people

Actually, creating a file per caption(or label) , didn't make much sense to me, hence the question.

singularity014 avatar Oct 06 '21 03:10 singularity014

@tarunn2799 Hi,I would like to know has this problem been solved.

Okay so in models/wrapper.py is the warmup_step hardcoded to 2000? My dataset currently is much smaller for the num_training_steps to be bigger than 2000.

Thanks for your time.

bk-201jk avatar Nov 02 '21 02:11 bk-201jk

@tarunn2799 Hi,I would like to know has this problem been solved.

Okay so in models/wrapper.py is the warmup_step hardcoded to 2000? My dataset currently is much smaller for the num_training_steps to be bigger than 2000.

Thanks for your time.

Hi @bk-201jk, I faced the same issue and solved the issue thanks to @ymzhu19eee in the issue #20

iremonur avatar Nov 23 '21 10:11 iremonur

@iremonur Thank you very much!And I want to know how many photo in your dataset. And how do you set up your directory structure? What is in txt, or are its contents in the title. I would appreciate it if I could see a set of data in your dataset!!

bk-201jk avatar Nov 23 '21 10:11 bk-201jk

I'm planning to prepare a 100k dataset (image-text pairs) for fine-tuning, but first I wanted to see if the code would work by running it with only 3 image-text pairs. The folder structure is as follows: train-CLIP/data/img/1.png train-CLIP/data/caption/1.txt And one of the texts: There is a car on the road.

iremonur avatar Nov 24 '21 08:11 iremonur

@iremonur .Thank you very much. If you can run the code with only 3 image-text pairs, please tell me .Thanks again!!

bk-201jk avatar Nov 24 '21 08:11 bk-201jk