lxmert icon indicating copy to clipboard operation
lxmert copied to clipboard

The Hyper parameters for the VizWiz datasets

Open runzeer opened this issue 4 years ago • 10 comments

Dear Pro: I read about the Vizwiz Leaderboard for ECCV 2018. The results shown are 55.40 for no model ensemble. But I trained the Vizwiz datasets and the results are only 51.96. So I want to know how the results different. The answer wocabulary for the Vizwiz dataset are chosen according to the most common 3000 categories. The initial lr rate is 5e-5, epochs are 4 and batchsize is 32. The pretraining model I used is the Epoch20_LXRT.pth. So if convenient, could you share your Hyper parameters for the Vizwiz datasets?

runzeer avatar May 12 '20 09:05 runzeer

Could you try this one that I used to submit the leaderboard entry?

BatchSize 64,
LR 1e-4,
Epochs 20 (Vizwiz is super small... 
        One epoch takes around 10 mins while VQA takes 1.5 hours, 
        we thus increase the number of epochs)

airsplay avatar May 12 '20 15:05 airsplay

OK! I would try it soon! Thanks a lot! But I still have 2 questions for the training. Looking forward to your reply.

  1. How do you deal with the answer labels? You know,every question has 10 answers. But it has no score for different answers like VQA. So how do you deal with the answer labels?
  2. The loss function issue. I choose Soft loss function used in https://github.com/DenisDsh/VizWiz-VQA-PyTorch/blob/master/train.py . But I do not know how you choose the loss function. Still the Crossentropy?

runzeer avatar May 13 '20 00:05 runzeer

Thanks. I have uploaded the materials here: http://nlp.cs.unc.edu/data/lxmert_data/vizwiz/vizwiz.zip. You could kindly take a look.

For the loss function, I just used CrossEntropy as VQA/GQA.

airsplay avatar May 13 '20 01:05 airsplay

Sorry to trouble you again.. When I use the materials above, there exists the KeyError: target[self.raw_dataset.ans2label[ans]] = score KeyError: '1 package stouffer signature classics fettuccini alfredo' But I do not find the solution because the key is in the dict. So could you help me find this?

runzeer avatar May 13 '20 02:05 runzeer

I think that I just remove the answer if it is not in the dict.

airsplay avatar May 13 '20 02:05 airsplay

OK!I found it! Thanks a lot!!

runzeer avatar May 13 '20 02:05 runzeer

I checked the test file and found the test files have been changed. And I wanted to use your docker but the pretrained model link below is out-of-date. https://www.dropbox.com/s/nu6jwhc88ujbw1v/resnet101_faster_rcnn_final_iter_320000.caffemodel?dl=1

So could you use your model to generate the new test data? Thanks a lot!

runzeer avatar May 13 '20 08:05 runzeer

The new dropbox link of the model is updated on bottom-up-attention repo and is available [here](alternative pretrained model).

airsplay avatar May 13 '20 14:05 airsplay

OK! Thanks a lot!! I wonder how you change the answers to the labels, especially adding the label confidence.

runzeer avatar May 14 '20 02:05 runzeer

This part is almost the same as the previous VQA pre-processing. You could read this repo for details.

airsplay avatar May 14 '20 03:05 airsplay