TubeTK
TubeTK copied to clipboard
Paper's experimental details
I draw your attention to following points
-
The MOT17 dataset in the training code has a length of 632040.
-
I am using this code for training on MOT17 on a cluster of 8 V100 GPUs and nvidi-apex and I cannot fit a batch size more than 2 on one GPU. Thus effectively 16 data points are processed by the node per iteration.
-
As implied by point 1, it means that it will take 632040/16 iterations to finish one epoch, and this means ~39500 steps per epoch.
-
Now with the node configuration, I am getting a training speed of 4.44 steps/seconds. So, one epoch would take ~2 days to finish.
My questions are
a) The authors mention using a batchsize of 32. How many GPUs or nodes were used to accomodate this batchsize ?
b) How long it took to train the model on MOT17 and JTA ?
c) Are my estimates close to what the authors experienced ?