CenterTrack
CenterTrack copied to clipboard
change the training image resolution
Hi,
I want to train my own dataset, and I have transformed my data to MOT data format, and have trained it successfully. Now I want to change the input image resolution by command python main.py tracking --exp_id mot17_half --dataset mot --dataset_version 17halftrain --pre_hm --ltrb_amodal --same_aug --hm_disturb 0.05 --lost_disturb 0.4 --fp_disturb 0.1 --gpus 0,1 --load_model ../models/crowdhuman.pth --input_h 144 --input_w 960
, but there are some error happens:
loading annotations into memory...
Done (t=0.72s)
creating index...
index created!
Creating video index!
Loaded MOT 17halftrain train 13995 samples
Starting training...
tracking/mot17_halfTraceback (most recent call last):
File "main.py", line 101, in
Can you share some advice about this? Besh wishes!!
I have the same question.And could you tell me the image resolution you used in your training command?Must be 544 and 960?
Take a look at https://github.com/xingyizhou/CenterTrack/blob/d3d52145b71cb9797da2bfb78f0f1e88b286c871/src/lib/model/networks/dla.py#L305-L316
Here, in line 313, self.level{i}
is called. For me, it broke when i
was equal to 4.
We can see that on the next line previous outputs are appended to the y. Let's take a look at those shapes:
ipdb> p list(map(lambda x: x.shape, y))
[torch.Size([4, 16, 1920, 1080]), torch.Size([4, 32, 960, 540]), torch.Size([4, 64, 480, 270]), torch.Size([4, 128, 240, 135])]
I can see here that I started with full-HD, and dimensions were reduced by a factor of 2 with every level. The last dimensions include 135, which is not divisible by 2.
My error says RuntimeError: The size of tensor a (68) must match the size of tensor b (67) at non-singleton dimension 3
.
We can see that it arises because of the fact that 135 is odd.
If we look here:
https://github.com/xingyizhou/CenterTrack/blob/d3d52145b71cb9797da2bfb78f0f1e88b286c871/src/lib/model/networks/dla.py#L243-L255
We can see that there are 5 levels, and it seems that the image resolution should be dividable by 2 at least 5 times, which means it should be dividable by 2^5==32
.
Now the question is whether or not it's the desired behavior, and is there a workaround. I've tried to run main with another arch but stumbled upon #45. Probably it might be simpler to just change the size of images a little bit.
Sorry for the delayed reply. Yes the input resolution should be dividable by 32 for DLA34.
I had one question regarding this issue only, If we give the input_h, input_w args during the training does it automatically resize the images?