PoseCNN
PoseCNN copied to clipboard
Projection Scaling issue
Hi @yuxng ,
I tried to test poseCNN on my own images, I have strange results :
The poses seem to be well estimated but it seems that there is a scaling issue. If I did understand your paper, this probably come from the fact that the bounding boxes (shown with the labels) are mispredicted. But I don't really get why. To record my images, I also used a Asus Xtion Pro Live RGB-D camera with the following intrinsic matrix :
"
[[528, 0, 320]
[0, 528, 240],
[0, 0, 1]]
"
I then tied to test it with another intrinsic matrix (as if I was working with a 1280x960 image) :
The projections look better but the estimated translation vector looks off (just by considering the bleach and mustard). The intrinsic matrix used for this is : " [[978.9, 0, 320], [0, 978.9, 240], [0, 0, 1]] "
Do you have an Idea about what would cause this ? Is my first intrinsic matrix right and I am just missing a scaling factor ? Or is my 2nd matrix right and the estimation of the bleach pose is just bad ?
Also, these estimations have been made without ICP refinement.
The network is trained with the specific intrinsic matrix in the YCB_Video dataset. So the distance of the object that the network predicts is related to that intrinsic matrix. You can try to visualize the results using that intrinsic matrix.
Using the same intrinsic matrix as in demo.py, I have a result comparable to my second case. I guess that sense since the intrinsic matrix of my camera is different. Thanks for you reply !
I am going to retrain the network using my own intrinsic matrix but I have another question regarding that : Since the intrinsic matrix of my camera and the intrincix matrix of the camera used to capture the dataset, how will that affect the results ?
You might have missed a verb or there is a typo, but your question doesn't make sense to me @TotoLulu94 Although I guess you were asking if it is possible to retrain with owns own intrinsic matrix in case the training dataset got recorded with that camera? Is that your question? If yes I would also be interested in the answer.
Yep that's right :
- Since the intrinsic matrix of my camera and the intrincix matrix of the camera used to capture the dataset are different, how will that affect the results ?
@TotoLulu94 I am seeing a similar scaling issue as you have mentioned when using a Kinect with a different intrinsic matrix. The predicted translation vector seems to be off by 10-20 cm. Did retraining on YCB video dataset with your own intrinsic matrix solve your problem ?