catr icon indicating copy to clipboard operation
catr copied to clipboard

Specific words are preferred to be generated when I was training catr-model on my own dataset

Open gitraffica opened this issue 3 years ago • 4 comments

prediction: [CLS] i was was was the the was was was i was i [SEP] dataset: [CLS] you do he coming for you mother he alive not well i [SEP]

prediction: [CLS] i was was i was i of and the i the i [SEP] dataset: [CLS] you do if you receive a letter from yourself with information only [SEP]

prediction: [CLS] i was was i i was i was was the the i [SEP] dataset: [CLS] you do mean that she again you do even know what you [SEP] ...........

It was strange because when I used the pre-trained catr-model, he works fine. I modified my dataset format to fit in coco-dataset style, and made sure each data pairs fed successfully into training(I printed the input image and captions during training). I made a mini dataset(n<40) to make sure its convergence(at last loss=0.214xxxx, actually I thought loss should converge to 0.001 due to it's so tiny), and this phenomenon didn't disappear. What possibly happens to my procedure?

gitraffica avatar Aug 17 '21 14:08 gitraffica

Hello,

First of all, this is an attention based architecture and hence it takes a lot of data samples to be trained efficiently. 40 data samples are nowhere near enough to generate pleasant results.

Kindly increase the dataset to minimum of 250-300 data samples and fine-tune on the provided pre-trained weights.

saahiluppal avatar Aug 21 '21 16:08 saahiluppal

Thank you. Actually I solved this problem by removing training transform and validation transform(and it got converged in my tiny dataset, despite I don't know why). Now I had fit the model into a larger(13000+ files) one and it works great!

gitraffica avatar Aug 21 '21 16:08 gitraffica

prediction: [CLS] i was was was the the was was was i was i [SEP] dataset: [CLS] you do he coming for you mother he alive not well i [SEP]

prediction: [CLS] i was was i was i of and the i the i [SEP] dataset: [CLS] you do if you receive a letter from yourself with information only [SEP]

prediction: [CLS] i was was i i was i was was the the i [SEP] dataset: [CLS] you do mean that she again you do even know what you [SEP] ...........

It was strange because when I used the pre-trained catr-model, he works fine. I modified my dataset format to fit in coco-dataset style, and made sure each data pairs fed successfully into training(I printed the input image and captions during training). I made a mini dataset(n<40) to make sure its convergence(at last loss=0.214xxxx, actually I thought loss should converge to 0.001 due to it's so tiny), and this phenomenon didn't disappear. What possibly happens to my procedure?

can you tell me how do to use it on small data set like flicker 8k

karimAimene avatar Sep 19 '21 02:09 karimAimene

you need change your caption dataset in coco format? Also split the image dataset into train and val

iamshant avatar Oct 31 '21 14:10 iamshant