DTPP
DTPP copied to clipboard
Reproducing the inference
I'm currently trying to reproduce the inference in arbitrary videos of the HMDB51 dataset by using the pretrained weights.
So far, I have:
- Compiled OpenCV with the contrib module and cuda support
- Successfully compiled caffe (caffe-tpp-net) with CUDA and python support
- Set up the PYTHONPATH environment variable so python2.7 may find the compiled modules
- Downloaded the pretrained weights with
get_init_models.sh
andget_kinetics_pretraining_models.sh
Since README.md doesn't mention how to perform inference, I read README_old.md and found that there is a script tools/eval_net.py
for that purpose. So I ran:
python tools/eval_net.py hmdb51 1 rgb /var/datasets/hmdb51/ models/hmdb51/flow_tpp_delete_dropout_deploy.prototxt ./init_models/hmdb51_split_1_tsn_flow_reference_bn_inception.caffemodel
But, after a ton of messages, that gave me the following error:
Traceback (most recent call last): File "tools/eval_net.py", line 125, in
video_scores = map(eval_video, eval_video_list) File "tools/eval_net.py", line 69, in eval_video video_frame_path = f_info[0][vid] KeyError: '20060723sfjffbumblebeesuitman_run_f_cm_np2_ri_med_1'
The path to the dataset is correct, and I have unpacked all rar files from the HMDB dataset. The README_old.md file mentions a script scripts/extract_optical_flow.sh
for preprocessing the video files, but in the master branch, this script doesn't exist.
So, are there any further steps necessary for reproducing inference?
Sorry for the trouble brought to you in running the code due to my unseriousness. I've update the README.md and you could refer to the eval_net_tpp_hmdb.py
for more details. Our trained models on UCF101 and HMDB51 have not been released.
Hi @zhujiagang, thank you so much for your swift replies!
I'm currently working on reproducing the training with the hmdb_scripts_split_1/train_rgb_tpp_delete_dropout_split_1.sh
script on a Titan X. I had a few issues and had to modify some files to avoid them, namely:
- Reduced batch size to 1 under
models/hmdb51/rgb_tpp_delete_dropout_split_1_train_val.prototxt
(for some reason I was getting out of memory error on my 12GB Titan X and this seemed to fixed it, can you tell me about the GPU you used for your training? If your GPU also had 12GB or less of memory, then there's something wrong in my setup) - Added
new_width: 224
andnew_height: 224
to all video layers undermodels/hmdb51/rgb_tpp_delete_dropout_split_1_train_val.prototxt
(for some reason, I was getting an errorcaffe data_transformer.cpp:491] Check failed: width <= datum_width
when I tried to train... did you manually resize the image frames when you extracted them from the videos? For each video on HMDB, I performed frame extraction with the following command:mkdir "<PATH>"; ffmpeg -i "<PATH>.avi" "<PATH>/img_%05d.jpg"
) - Changed the device id to from 3 to 0 under
models/hmdb51/rgb_tpp_delete_dropout_split_1_solver.prototxt
- Created the folder
snapshot
in the repo root, otherwise caffe was crashing when trying to save snapshots
I've been running the training for several hours now and this is the latest iteration output:
I1031 15:06:09.097961 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 877.991 > 40) by scale factor 0.0455585
I1031 15:06:21.079704 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 700.793 > 40) by scale factor 0.0570782
I1031 15:06:33.580734 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1143.5 > 40) by scale factor 0.0349803
I1031 15:06:46.480859 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1017.31 > 40) by scale factor 0.0393192
I1031 15:06:58.553812 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1014.26 > 40) by scale factor 0.0394377
I1031 15:07:10.659497 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1247.51 > 40) by scale factor 0.0320639
I1031 15:07:23.238704 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 968.414 > 40) by scale factor 0.0413046
I1031 15:07:35.426038 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1048.87 > 40) by scale factor 0.0381362
I1031 15:07:48.065246 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1141.9 > 40) by scale factor 0.0350292
I1031 15:08:00.421540 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 734.556 > 40) by scale factor 0.0544547
I1031 15:08:12.666364 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 748.326 > 40) by scale factor 0.0534526
I1031 15:08:25.080652 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1113.1 > 40) by scale factor 0.0359356
I1031 15:08:37.743530 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 878.843 > 40) by scale factor 0.0455144
I1031 15:08:49.748545 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1046.37 > 40) by scale factor 0.0382275
I1031 15:09:01.976547 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1059.37 > 40) by scale factor 0.0377582
I1031 15:09:14.561251 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1073.01 > 40) by scale factor 0.0372784
I1031 15:09:27.069676 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1176.19 > 40) by scale factor 0.0340082
I1031 15:09:39.748011 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1173.07 > 40) by scale factor 0.0340987
I1031 15:09:51.674002 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1058.21 > 40) by scale factor 0.0377996
I1031 15:10:04.277062 4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1022.46 > 40) by scale factor 0.0391213
I1031 15:10:16.263995 4431 solver.cpp:240] Iteration 4740, loss = 0.184195
I1031 15:10:16.264025 4431 solver.cpp:255] Train net output #0: accuracy = 1
I1031 15:10:16.264046 4431 solver.cpp:255] Train net output #1: loss = 0.0173458 (* 1 = 0.0173458 loss)
I1031 15:10:16.264051 4431 solver.cpp:640] Iteration 4740, lr = 0.001
The gradient scaling down and accuracy 1 worries me a bit, is that expected?
If I understand the paper correctly, I should also generate a temporal model by training with the script hmdb_scripts_split_1/train_flow_tpp_delete_dropout_split_1.sh
, is that correct? Do I need to preprocess the input images somehow? Is that what the lib/dense-flow
is there for? Is there a script for automating this process?
About out of memory due to batch_size, how did you compile caffe-tpp-net? Have you installed openmpi before compiling caffe-tpp-net? Please refer to the command in TSN repo for more details of compiling caffe with openmpi. Because I've found that compiling their caffe with openmpi can save a lot of memory.
Which caffe do you use? I do not remember that I need to set new_width, new_height
when using caffe-tpp-net. crop_size: 224
is enough.
I used the script in TSN repo to extract frames and extract optical flow. Please refer to the TSN repo.
I think your training process is fine with batch_size: 1
. Once you could run with batch_size: 4
, the training curve may change. I also want to remind you that I've decreased learning rate by hand when validation accuracy no longer increases as paper says
Instead of decreasing the learning rate according to a fixed schedule, the learning rate is lowered by a factor of 10 after validation error saturates.
A better way is to find a fixed schedule which is more appropriate for ablation study.
The base learning rate for rgb stream and flow stream you should also follow the paper.
Hello @zhujiagang, thanks again for your clear answer!
I probably needed to set new_width
and new_height
because I extracted myself the frames of the dataset using ffmpeg. When I proceeded to extract the optical flow using the TSN repo, I noticed that their script also extracted the frames themselves, so I suppose they resize the images to a minimum dimension.
Now I have another question. I have finished training both the RGB and the Flow models, and I'd like to combine these models to perform predictions like in the paper. Please correct me if I'm wrong, but the script eval_tpp_net_hmdb.py
only seems to evaluate the RGB or the Flow models separately, unlike the full model described in your paper. Is there another script that combines both trained models to perform inference?
Thanks again!
@csantosbh You can refer to the eval_scores_rgb_flow.py
I see! The eval_scores_rgb_flow.py
complained about some missing .npz
files, so I guess I should have changed the save_scores
variable in the eval_tpp_net_hmdb.py
script, right? So the whole process should be like this:
- Open
eval_tpp_net_hmdb.py
, edit bothnet_weights
andsave_scores
variables accordingly to run the RGB evaluation - Run
eval_tpp_net_hmdb.py
- Open
eval_tpp_net_hmdb.py
, edit bothnet_weights
andsave_scores
variables accordingly to run the Flow evaluation - Run
eval_tpp_net_hmdb.py
- Open
eval_scores_rgb_flow.py
, edit the variablescore_files
to the names of the.npz
files generated by steps 2 and 4 - Run
eval_scores_rgb_flow.py
Is that correct? I'm running inference again since my .npz
file was overridden when I ran the flow evaluation (I only changed the net_weights
variable).
Yes!
Thank you @zhujiagang! I was able to reproduce the results of the paper with ~73% accuracy with the fusion model.
Now I'm trying to use the Kinects pre trained model to improve the accuracy as in the paper. What I did was edit the script hmdb_scripts_split_1/train_rgb_tpp_delete_dropout_split_1.sh
and change the --weights
flag to the file kinetics_pretraining_models/bn_inception_kinetics_rgb_pretrained/bn_inception_kinetics_rgb_pretrained.caffemodel
(I would do something similar for the Flow trainer). However, caffe crashes when I try to run training like this. Is there anything else that needs to be changed?
@csantosbh You should use the kinetics_hmdb_split_1/train_kinetics_rgb_tpp_p124_split_1.sh
, because the authors of TSN provide kinetics pretraining model and prototxt in http://yjxiong.me/others/kinetics_action/ with different name from using imagenet pretraining.