bLVNet-TAM icon indicating copy to clipboard operation
bLVNet-TAM copied to clipboard

Cannot replicate Kinetics-400 Results

Open ilkarman opened this issue 4 years ago • 4 comments

Thank you very much for posting the codebase! However, I'm having difficulty replicating the 71.0% acc mentioned in the paper for 'bLVNet-TAM-8×2', I only get around 55% when validating each epoch. Training accuracy is 58%. If I do 10 crops per video for validation this rises to around 57%

Would it be possible to share your training log file? Or know your final training accuracy to see if something is wrong with my validation/eval script?

Wanted to check if any of the below was wrong:

  • TAM Backbone is initialised from 'ImageNet-bLResNet-50-a2-b4.pth.tar'
  • For bLVNet-model:
{'depth': 50, 'alpha': 2, 'beta': 4, 'groups': 16, 'num_classes': 400, 'dropout': 0.5, 'blending_frames': 3, 'input_channels': 3, 'pretrained': None, 'dataset': 'kinetics400', 'imagenet_blnet_pretrained': True}
  • For the training: 0.01 LR with a total batch of 64 (8 * 8 per GPU); cosine-annealing LR-schedule trained for 50 epochs

I thought perhaps this should be trained for 100 epochs; similar to TSM, not 50?

ilkarman avatar Jun 07 '20 11:06 ilkarman

As you pointed out in #4, the pretrained ImageNet model might not be loaded correctly, do you get this results after fixed the issue?

chunfuchen avatar Jun 28 '20 02:06 chunfuchen

Thanks very much for your reply! Unfortunately this is still the case for me despite loading pretrained weights. I was wondering whether you could share your training/validation accuracy for a few epochs (not the multi-crop, multi-clip validation) to help debug?

ilkarman avatar Jun 29 '20 11:06 ilkarman

I thought it could be more helpful to leave this log below (r50_a2_b4_f8x2). Initiating from pre-trained weights and then training 50 epochs with cosine-annealing from LR of 0.01 for a batch of 64 samples:

Epoch 0 Validation Acc: 0.09
Epoch 10 Validation Acc: 0.29
Epoch 20 Validation Acc: 0.35
Epoch 30 Validation Acc: 0.41
Epoch 40 Validation Acc: 0.47
Epoch 49 Validation Acc: 0.53

Using LR=0.001 I get:

Epoch 0 Validation Acc: 0.04
Epoch 10 Validation Acc: 0.36
Epoch 20 Validation Acc: 0.43
Epoch 30 Validation Acc: 0.46
Epoch 40 Validation Acc: 0.52
Epoch 49 Validation Acc: 0.54

Using LR=0.005 I get:

Epoch 0 Validation Acc: 0.11
Epoch 10 Validation Acc: 0.29
Epoch 20 Validation Acc: 0.35
Epoch 30 Validation Acc: 0.41
Epoch 40 Validation Acc: 0.47
Epoch 49 Validation Acc: 0.53

Training - GroupMultiScaleCrop(224), RandomHorizontalFlip(0.5) Validation - ResizeShortestSide(256), CentreCrop(224)

So validation is always around 0.53% and training-accuracy is around 52%. I guess to reach 71.0% acc with multi-clip, multi-crop the regular validation should be 65% + So I'm definitely behind a lot.

ilkarman avatar Jul 16 '20 17:07 ilkarman

Hi, sorry for the late reply.

  1. We trained the Kinetics400 with 100 epochs instead of 50 epochs, sorry for the confusion.

  2. You might want to check the number of videos you had in the training and validation set of the Kinetics400 dataset (https://github.com/facebookresearch/video-nonlocal-net/issues/67)

  3. here is the log I retrain the model (by using the codes in this repo) with total batch size 72 under 6 GPUs setting:

Namespace(alpha=2, batch_size=72, beta=4, blending_frames=3, dataset='kinetics400', dense_sampling=False, depth=50, disable_scaleup=True, dropout=0.5, epochs=100, evaluate=False, frames_per_group=1, gpu=None, groups=16, imagenet_blnet_pretrained=True, input_channels=3, input_shape=224, logdir='./', lr=0.01, lr_scheduler='cosine', lr_steps=[15, 30, 45], modality='rgb', momentum=0.9, num_classes=400, num_clips=1, num_crops=1, pretrained=False, print_freq=500, random_sampling=False, resume=None, show_model=False, start_epoch=0, weight_decay=0.0005, workers=64)

100 epochs, single clip, and single crop acc

Val  : [001/100]	Loss: 3.7331	Top@1: 19.4875	Top@5: 45.4624	Speed: 967.55 ms/batch
Val  : [010/100]	Loss: 2.3200	Top@1: 45.5692	Top@5: 73.5218	Speed: 1037.19 ms/batch
Val  : [020/100]	Loss: 2.1634	Top@1: 49.6924	Top@5: 75.4639	Speed: 951.47 ms/batch
Val  : [030/100]	Loss: 2.0265	Top@1: 51.9752	Top@5: 78.5500	Speed: 950.88 ms/batch
Val  : [040/100]	Loss: 1.8245	Top@1: 55.8544	Top@5: 81.2802	Speed: 953.56 ms/batch
Val  : [050/100]	Loss: 1.8113	Top@1: 57.8118	Top@5: 81.9360	Speed: 1041.54 ms/batch
Val  : [060/100]	Loss: 1.7302	Top@1: 59.8759	Top@5: 82.9783	Speed: 1095.85 ms/batch
Val  : [070/100]	Loss: 1.5482	Top@1: 63.2874	Top@5: 85.2509	Speed: 1160.07 ms/batch
Val  : [080/100]	Loss: 1.4294	Top@1: 66.3582	Top@5: 87.0304	Speed: 993.56 ms/batch
Val  : [090/100]	Loss: 1.3322	Top@1: 69.1952	Top@5: 88.1946	Speed: 893.33 ms/batch
Val  : [100/100]	Loss: 1.3105	Top@1: 69.8968	Top@5: 88.4641	Speed: 968.87 ms/batch

The above model with 3 crops and 3 clip testing

Val@224(224) (# crops = 3, # clips = 3): 	Top@1: 71.2543	Top@5: 89.3386

I also retrained one with 50 epochs, it results in about 2% lower than the one trained with 100 epochs.

chunfuchen avatar Jul 31 '20 18:07 chunfuchen