Meshroom
Meshroom copied to clipboard
[question] Camera intrinsics matrix from cameras.sfm
Describe the problem I am trying to obtain the projection coordinates on an image of a 3D point of the mesh generated with Meshroom. However when constructing the camera intrinsic matrix as described in [(https://en.wikipedia.org/wiki/Camera_resectioning)] I get wrong results. This is what i get from cameras.sfm after structure from motion.
{ "intrinsicId": "1128448763",
"width": "4000",
"height": "3000",
"sensorWidth": "6.1699999999999999",
"sensorHeight": "4.6275000000000004",
"serialNumber": "6a3bff7cff7dffff206affff13ff3028",
"type": "radial3",
"initializationMode": "estimated",
"initialFocalLength": "3.6099999999999994",
"focalLength": "3.5988558298758133",
"pixelRatio": "1",
"pixelRatioLocked": "true",
"principalPoint": [
"13.498251531673015",
"-26.248889380993322"
],
"distortionInitializationMode": "none",
"distortionParams": [
"-0.0063465089616403896",
"0.0030250668407204571",
"-0.0017936039159354772"
],
"undistortionOffset": [
"0",
"0"
],
"undistortionParams": "",
"locked": "true"
}
],
"poses": [
{
"poseId": "15822413",
"pose": {
"transform": {
"rotation": [
"-0.35775470843537033",
"0.72431172821648682",
"-0.58939298346720159",
"-0.00060470419753410275",
"0.6309865263412392",
"0.77579355366530955",
"0.93381540088239956",
"0.27790020500867246",
"-0.2253004064154816"
],
"center": [
"29.711263680710861",
"-35.714602995244938",
"11.560545223684104"
]
},
"locked": "1"
}
},
...
from here I did the following:
f = 3.5988558298758133
mx = 6.1699999999999999
my = 4.6275000000000004
px = 13.498251531673015
py = -26.248889380993322
p_w = np.array([12.168, 20.072,15.644,1]) # point coordinates in world
t = np.array([29.711263680710861, -35.714602995244938, 11.560545223684104]) # center of camera in world
K = np.array([
[f/mx , 0, px ],
[0, f/my, py],
[0, 0, 1]
])
R = np.array([[-0.35775470843537033, -0.00060470419753410275, 0.93381540088239956],
[0.72431172821648682, 0.6309865263412392, 0.27790020500867246],
[-0.58939298346720159, 0.77579355366530955, -0.2253004064154816]], dtype = "double"
)
I = np.identity(3)
q=np.zeros((3,4))
q[0:3, 0:3]= I
q[0:3,3] = -t
M = K@R@q
point_image = M @ p_w / (M @ p_w)[2]
obtaining point_image = array([ 13.60954989, -25.90018649, 1. ])
which is unfortunately incorrect. Therefore my questions are the following:
- is K correct?
- is R correct?
- what are the units of the principal points? are those in mm or in pixel coordinates?
- ultimately, what am I doing wrong here?
additional info both the coordinates of the point and the camera center are in meters (obtained from blender), but i guess rescaling everything in mm would not make a difference. Is this wrong?
Desktop (please complete the following and other pertinent information):
- OS: [linux]
- Python version [e.g. 3.10]
- Meshroom version: please specify if you are using a release version or your own build
- Binary version (if applicable) [e.g. 2023]
px and py are offsets wrt the center of the image. I know it's not the standard way that everybody uses but here you have to add px and py to width/2
and height/2
, respectively, to get the real principal point.
Also I never remember in which format, row-major or colum-major, the matrices are saved. If after fixing the px and py problem you still have a large projection error try to read R and transpose it before using it.
Also for the focal length, again it is not a standard format as it is expressed in mm. To get it back in pixel you can use the formula
pxFocalLength = (focalLength / sensorWidth) *std::max(image().Width(), image().Height());
As for the matrix, it should be stored in column-major order as per default in Eigen https://eigen.tuxfamily.org/dox/group__TopicStorageOrders.html So your R should be ok.
hey @simogasp, thanks a lot!
The projection is still way off unfortunately.
pxFocalLength = (f / mx) * 4000
pyFocalLength = (f / my) * 3000
K = np.array([
[pxFocalLength , 0, 2000 + px ],
[0, pyFocalLength, 1500+py],
[0, 0, 1]
])
R = np.array([[-0.35775470843537033, -0.00060470419753410275, 0.93381540088239956],
[0.72431172821648682, 0.6309865263412392, 0.27790020500867246],
[-0.58939298346720159, 0.77579355366530955, -0.2253004064154816]], dtype = "double"
)
RT=R.T
I = np.identity(3)
q=np.zeros((3,4))
q[0:3, 0:3]= I
q[0:3,3] = -t
M = K@R@q
print(M @ p_w / (M @ p_w)[2])
gives array([2.45869170e+03, 2.51985979e+03, 1.00000000e+00])
.
this should be the result: (see green box)
but i get:
also why is it
pxFocalLength = (focalLength / sensorWidth) *std::max(image().Width(), image().Height());
and not times the width for pxFocalLength and times the height for pyFocalLength?
also why is it
pxFocalLength = (focalLength / sensorWidth) *std::max(image().Width(), image().Height());
and not times the width for pxFocalLength and times the height for pyFocalLength?
It's coming from here when reading the exif https://github.com/alicevision/AliceVision/blob/57cc8a02f653ce1f754cda2dcf8a3cf517405bf0/src/aliceVision/sfmDataIO/viewIO.cpp#L193
and here when reading from json https://github.com/alicevision/AliceVision/blob/57cc8a02f653ce1f754cda2dcf8a3cf517405bf0/src/aliceVision/sfmDataIO/jsonIO.cpp#L287 here is just
fx = (fmm / sensorWidth) * double(width);
with fmm the focal in mm from the json and fx the focal on the x in pixels.
It's confusing because the focal length in pixel is always used for all computations but it's exported in mm for compatibility with the ABC and software like Maya Blender and so on.
@fabiencastan @servantftechnicolor can you check if it is the right conversion?
I see that you transpose R in the snippet of code. I imagine that without transposing it does not work either, does it?
Yeah indeed it doesn't work without transposing R either. Is pheraps the way I compute the coordinates wrong?
just to be sure because I don't speak numpy, I was assuming that
M = K@R@q
is the matrix product of the matrices, right? so that we correctly have K * [R |-R*t]
. Am I right?
(you'd better not call it t
because it could confuse people, that is the center c of the camera in the world coordinates and the actual t
of the rototranslation matrix is -R*c = t
)
Thanks a lot! I got it just now. All the things you suggested were correct:
- translating the principle point by adding width/2 and height/2
- FX = (f/sensor_width)*width
- FY = (f/sensor_height)*height The reason why it was not working for me it's because I was reading the coordinates of the point in the 3D space from blender, which flipped the axis. I was therefore expecting something that would never happen 😂
But now it works perfectly even without distortion parameters! Thanks again!!
Thanks a lot! I got it just now. All the things you suggested were correct:
- translating the principle point by adding width/2 and height/2
- FX = (f/sensor_width)*width
- FY = (f/sensor_height)*height The reason why it was not working for me it's because I was reading the coordinates of the point in the 3D space from blender, which flipped the axis. I was therefore expecting something that would never happen 😂
But now it works perfectly even without distortion parameters! Thanks again!!
That was my next question. I was smelling the usual problem with the different conventions used for expressing the camera frame from computer graphics and computer vision... It's always the usual suspect! ;-)
Just for future reference would you mind posting your working snippet of code, like the one above? Thanks!
yes of course! from the json I posted at the beginning of the question i get the data, then:
# info from json
f = 3.5988558298758133
mx = 6.1699999999999999
my = 4.6275000000000004
px = 13.498251531673015
py = -26.248889380993322
width = 4000
height = 3000
# points
p_w = np.array([12.187, -20.025, -16.133, 1]) # point coordinates in world
t = np.array([29.711263680710861,-35.714602995244938, 11.560545223684104]) # center of camera in world
pxFocalLength = (f / mx) * width
pyFocalLength = (f / my) * height
K = np.array([
[pxFocalLength , 0, px+width/2 ],
[0, pyFocalLength, py+height/2],
[0, 0, 1]
])
R = np.array([[-0.35775470843537033, -0.00060470419753410275, 0.93381540088239956],
[0.72431172821648682, 0.6309865263412392, 0.27790020500867246],
[-0.58939298346720159, 0.77579355366530955, -0.2253004064154816]], dtype = "double"
)
q=np.zeros((3,4))
q[0:3, 0:3]= R
q[0:3,3] = -np.dot(R,t)
M = np.dot(K,q)
pixel_coordinates = np.dot(M, p_w) / np.dot(M , p_w)[2]
and that works perfectly :)
thanks again