GSM Exact Configuration for reproducing results on Diving48

Hi,

I would like to achieve the performance you mentioned in the paper (~40%). I am training the model with the following configuration which after 15 epochs gave me 18.65% accuracy,

python3 main.py diving48 RGB --split 1 --arch InceptionV3 --num_segments 16 --consensus_type avg \
--batch-size 8 --iter_size 2 --dropout 0.5 --lr 0.01 --warmup 10 --epochs 20 \
--eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm

Can you please provide the exact configuration used?

Feb 23 '20 14:02 avijit9

python3 main.py diving48 RGB --split 1 --arch InceptionV3 --num_segments 16 --consensus_type avg --batch-size 8 --iter_size 1 --dropout 0.7 --lr 0.01 --warmup 10 --epochs 20 --eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm

Feb 23 '20 15:02 swathikirans

I am getting the following values after training-


Class Accuracy 19.91%
Overall Prec@1 30.34% Prec@5 62.36%

Any suggestions?

Feb 24 '20 01:02 avijit9

Can you do a validation on the model shared to see if the same score reported in the paper is obtained. If not, I would suggest to extract the frames without lossy compression.

Feb 24 '20 07:02 swathikirans

Yes, I did that. Following is the result-


Class Accuracy 27.00%
Overall Prec@1 38.98% Prec@5 67.41%

Feb 24 '20 08:02 avijit9

Can you extract the frames using the script given and try again?

Feb 24 '20 09:02 swathikirans

Ok, I'll try. But with the pre-trained model, I can achieve the performance close to the reported one. Do you think it is a problem with the frame extraction?

Feb 24 '20 10:02 avijit9

I would assume so. I also had the same problem of getting very low scores when the frames were compressed.

Feb 24 '20 10:02 swathikirans

This is the value I am getting after extracting frames with your code.


Class Accuracy 21.83%
Overall Prec@1 32.97% Prec@5 63.36%

Btw, I am a little confused with the metrics. What is the difference between class accuracy and overall prec@1? The state-of-the-art methods report the accuracy, isn't it?

Feb 25 '20 02:02 avijit9

and also how much the does the result vary across run? I ran it again and got around ~35%

Feb 25 '20 10:02 avijit9

I got between 36 and 39 in 4 or 5 runs.

Class accuracy is the mean class accuracy. Overall prec@1 is the accuracy which is reported in sota methods.

Feb 25 '20 18:02 swathikirans

Oh! I could get 33 at max after running the model 4-5 times.

Feb 25 '20 18:02 avijit9

But are you getting 40% accuracy with the shared model using 2 clips?

Feb 25 '20 21:02 swathikirans

Yes, not exactly 40%. It is close to 39 to be precise.

Feb 25 '20 23:02 avijit9

@swathikirans can you please suggest any idea to reproduce the result of the paper?

Feb 27 '20 11:02 avijit9

I am not sure what exactly the problem is. I would assume it to be the issue with the extracted frames. Can you also try to have some runs by changing "args.epochs" to "args.epochs+1" in line 151?

Feb 27 '20 11:02 swathikirans

I extracted the frames using your code. :(

Okay, I'll try. Thanks for helping :)

Feb 27 '20 13:02 avijit9

hi, I tried running the code again with the change you suggested. Now, the performance is even worse (~31%)

Feb 28 '20 01:02 avijit9

Also, which layer did you use to visualize the model attention? Do you have code for that (feature extraction from intermediate layer)?

Feb 28 '20 05:02 avijit9

Hi, I am not sure what is wrong. I had two more runs and got >37%.

For visualization, we followed the approach from this repo https://github.com/alexandrosstergiou/Saliency-Tubes-Visual-Explanations-for-Spatio-Temporal-Convolutions

You can remove the higher layers in this file to obtain the intermediate features. https://github.com/swathikirans/GSM/blob/master/model_zoo/bninception/bn_inception_gsm.yaml

Feb 28 '20 10:02 swathikirans

What is your ffmpeg version?

Mar 01 '20 13:03 avijit9

also, is it possible to share the log file for your run?

Mar 01 '20 18:03 avijit9

I used ffmpeg 4.0 for extracting the frames

Mar 02 '20 08:03 swathikirans

You can find the log file here: https://drive.google.com/file/d/1oyDajU3EFHYdR_PpjzqlM6J7BDCVS-OJ/view?usp=sharing

Mar 02 '20 08:03 swathikirans

Is this log file with args.epoch + 1? Because I can see some difference in learning rate between your log and mine after epoch 1

I am evaluating the model after every epoch whereas you do after every 5 epochs. Should it make any difference?

Mar 02 '20 09:03 avijit9

This is with the setting from the paper. Evaluating in every epoch should not make any difference.

Mar 02 '20 09:03 swathikirans

No, I was asking if this the log file you obtained when you set args.epochs in line 151 as args.epoch (as you suggested above)?

Mar 02 '20 09:03 avijit9

That was my answer. This log file corresponds to the setting from paper (from the published code).

Mar 02 '20 10:03 swathikirans

I think there is a problem in learning rate scheduler. According to your log file, the learning rate should be as follows-

Epoch 0: 0.00100
Epoch 1: 0.00190
Epoch 2: 0.00280
...

But if I run your code, I am getting as follows -

Epoch 0: 0.00100
Epoch 1: 0.00100
Epoch 2: 0.00190
Epoch 3: 0.00280
...

I tried to calculate the learning rate by hand from the CosineAnnealingLR.py file too. According to the formula, it should be as I am getting in my log file. Is there any chance you forgot to commit some changes?

Thanks.

Mar 02 '20 11:03 avijit9

I checked the scheduler and the learning rate should follow the trend in the log file I shared. Which pytorch version are you using?

Mar 02 '20 12:03 swathikirans

I was using PyTorch 0.4. Yesterday, I upgraded Pytorch to v1.2. Now, I had 2-3 runs. Max value I am getting is 34.208.

Here is the log file: https://drive.google.com/file/d/1KdY6M62N6s8I2n-IldJnTXiBjXsk7QkP/view?usp=sharing

Mar 03 '20 16:03 avijit9

GSM GSM copied to clipboard

Exact Configuration for reproducing results on Diving48

GSM
GSM copied to clipboard