GSM
GSM copied to clipboard
Exact Configuration for reproducing results on Diving48
Hi,
I would like to achieve the performance you mentioned in the paper (~40%). I am training the model with the following configuration which after 15 epochs gave me 18.65% accuracy,
python3 main.py diving48 RGB --split 1 --arch InceptionV3 --num_segments 16 --consensus_type avg \
--batch-size 8 --iter_size 2 --dropout 0.5 --lr 0.01 --warmup 10 --epochs 20 \
--eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm
Can you please provide the exact configuration used?
python3 main.py diving48 RGB --split 1 --arch InceptionV3 --num_segments 16 --consensus_type avg --batch-size 8 --iter_size 1 --dropout 0.7 --lr 0.01 --warmup 10 --epochs 20 --eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm
I am getting the following values after training-
Class Accuracy 19.91%
Overall Prec@1 30.34% Prec@5 62.36%
Any suggestions?
Can you do a validation on the model shared to see if the same score reported in the paper is obtained. If not, I would suggest to extract the frames without lossy compression.
Yes, I did that. Following is the result-
Class Accuracy 27.00%
Overall Prec@1 38.98% Prec@5 67.41%
Can you extract the frames using the script given and try again?
Ok, I'll try. But with the pre-trained model, I can achieve the performance close to the reported one. Do you think it is a problem with the frame extraction?
I would assume so. I also had the same problem of getting very low scores when the frames were compressed.
This is the value I am getting after extracting frames with your code.
Class Accuracy 21.83%
Overall Prec@1 32.97% Prec@5 63.36%
Btw, I am a little confused with the metrics. What is the difference between class accuracy
and overall prec@1
? The state-of-the-art methods report the accuracy, isn't it?
and also how much the does the result vary across run? I ran it again and got around ~35%
I got between 36 and 39 in 4 or 5 runs.
Class accuracy is the mean class accuracy. Overall prec@1 is the accuracy which is reported in sota methods.
Oh! I could get 33 at max after running the model 4-5 times.
But are you getting 40% accuracy with the shared model using 2 clips?
Yes, not exactly 40%. It is close to 39 to be precise.
@swathikirans can you please suggest any idea to reproduce the result of the paper?
I am not sure what exactly the problem is. I would assume it to be the issue with the extracted frames. Can you also try to have some runs by changing "args.epochs" to "args.epochs+1" in line 151?
I extracted the frames using your code. :(
Okay, I'll try. Thanks for helping :)
hi, I tried running the code again with the change you suggested. Now, the performance is even worse (~31%)
Also, which layer did you use to visualize the model attention? Do you have code for that (feature extraction from intermediate layer)?
Hi, I am not sure what is wrong. I had two more runs and got >37%.
For visualization, we followed the approach from this repo https://github.com/alexandrosstergiou/Saliency-Tubes-Visual-Explanations-for-Spatio-Temporal-Convolutions
You can remove the higher layers in this file to obtain the intermediate features. https://github.com/swathikirans/GSM/blob/master/model_zoo/bninception/bn_inception_gsm.yaml
What is your ffmpeg version?
also, is it possible to share the log file for your run?
I used ffmpeg 4.0 for extracting the frames
You can find the log file here: https://drive.google.com/file/d/1oyDajU3EFHYdR_PpjzqlM6J7BDCVS-OJ/view?usp=sharing
Is this log file with args.epoch + 1
? Because I can see some difference in learning rate between your log and mine after epoch 1
I am evaluating the model after every epoch whereas you do after every 5 epochs. Should it make any difference?
This is with the setting from the paper. Evaluating in every epoch should not make any difference.
No, I was asking if this the log file you obtained when you set args.epochs in line 151 as args.epoch (as you suggested above)?
That was my answer. This log file corresponds to the setting from paper (from the published code).
I think there is a problem in learning rate scheduler. According to your log file, the learning rate should be as follows-
Epoch 0: 0.00100
Epoch 1: 0.00190
Epoch 2: 0.00280
...
But if I run your code, I am getting as follows -
Epoch 0: 0.00100
Epoch 1: 0.00100
Epoch 2: 0.00190
Epoch 3: 0.00280
...
I tried to calculate the learning rate by hand from the CosineAnnealingLR.py file too. According to the formula, it should be as I am getting in my log file. Is there any chance you forgot to commit some changes?
Thanks.
I checked the scheduler and the learning rate should follow the trend in the log file I shared. Which pytorch version are you using?
I was using PyTorch 0.4. Yesterday, I upgraded Pytorch to v1.2. Now, I had 2-3 runs. Max value I am getting is 34.208.
Here is the log file: https://drive.google.com/file/d/1KdY6M62N6s8I2n-IldJnTXiBjXsk7QkP/view?usp=sharing