faster_rcnn_pytorch Wrong format for bounding boxes

It seems that the network uses x1,y1,x2,y2 format for bounding boxes instead of x,y,w,h used in the paper. I think this is a pretty major difference that can affect training accuracy.

In x,y,w,h format two coordinates are used for centering and two for size, which presents clear separation and can be debugged easily. In the current format, all four coordinates are used for both centering and size, which makes it more difficult to debug.

Jul 28 '17 09:07 Rizhiy

Did you try it?

Jul 29 '17 16:07 Cadene

I haven't since I don't quite understand the whole codebase and it appears that quite a bit would have to be changed. In particular, it appears that cython code expects it in the current format and I don't have to access to cython source to change it.

It appears that this format was chosen in the fast-rcnn pytorch implementation or maybe even before, so probably would be difficult to change now. I don't think that training accuracy will be affected that much, but may matter if you are trying to win a competition.

Jul 30 '17 15:07 Rizhiy

Yeap, unfortunately this code is difficult to understand and modify. As I was looking for some localization models in pytorch, I found this repo https://github.com/amdegroot/ssd.pytorch. The model works nicely and the codebase is way easier to understand.

It seems that the ssd.pytorch models use x1,y1,x2,y2 format as well https://github.com/amdegroot/ssd.pytorch/blob/master/data/voc0712.py#L81

Jul 31 '17 00:07 Cadene

I found cython source, so might try to change it later.

Aug 02 '17 10:08 Rizhiy