VLC Camera's Extrinsics Matrix Coordinate Frame Convention
Hello,
I want to use the camera extrinsics and the pose (given in the world frame) for each of the VLC cameras by this package and solve for the transform between the HoloLens rignode and the Hololens headset pose (that is accessed in a unity application). I am publishing out the HoloLens headset pose w.r.t. my unity app world frame to the same machine I am running a hl2ss client. I want to solve the transform chain to obtain the Rignode->Headset Pose ($T_{HR}$) using the known HoloLens Headset Pose->Unity World ($T_{WH}$), the Camera->World ($T_{WC}$) obtained from data.pose values in this package and the Camera->Rignode ($T_{RC}$) obtained from the inverse of the extrinsics matrix ($T_{CR}^{-1}$) obtained from this package for each of the four vlc sensors. I am having issues though, as I am not sure what coordinate frame convention the extrinsics matrix's are in from this package?
Here is an example of the extrinsics matrix I obtained using this package for the RightFront VLC camera:
rf_vlc_extrinsics = np.array([[-0.998577, 0.05308131, -0.00513458, 0. ],
[-0.05296633, -0.9983878, -0.0204043, 0. ],
[-0.00620939, -0.02010331, 0.9997786, 0. ],
[-0.00279547, -0.09953252, -0.00105577, 1. ]])
From my observation, it looks to be in row-major format for sure, but after that I am not sure what the convention is. I know the following coordinate frame representations are:
- Unity:
LH x=right, y=up, and z=forward - $T_{WC}$ (Camera->World):
RH x=right, y=up, and -z=forward# obtained from this hl2ss package
It does not look to be that the extrinsics matrix's obtained here in this package are in the same coordinate frame representation as the Camera->World ($T_{WC}$ ) pose obtained from this package.
On page 7 in the Research Mode API documentation the rignode looks to be in RH x=forward, y=left, and z=up, but then the frames are shown to be rotated for the RightFront VLC camera where it looks to be RH -x=forward, y=right, and z=up.
Any information you can provide to help is very much appreciated! This package is awesome, thank you for all the hard work!
Hi,
For the Research Mode sensors (including VLC), data.pose gives the pose of the rignode w.r.t. to the world.
For a given 3D point p = [x; y; z; 1] in rignode coordinates, p.T @ data.pose converts it to world coordinates.
For a 3D point in VLC coordinates, p.T @ inv(calibration_vlc.extrinsics) converts to rignode coordinates and the VLC to world conversion is p.T @ inv(calibration_vlc.extrinsics) @ data.pose.
The HoloLens2SensorStreaming.cs script links the coordinate system of the Unity scene to the hl2ss plugin, so the origin of the Unity scene (world) and hl2ss is the same. However, a right-to-left handed coordinate transform must be performed as described in https://learn.microsoft.com/en-us/windows/mixed-reality/develop/unity/unity-xrdevice-advanced?tabs=mrtk.
Then, if your $T_{HW}$ is left-handed and in conventional form, I think the transform should be p.T @ data.pose @ diag(1,1,-1,1) @ T_{HW}.T given p in rignode coordinates, but I'm not sure.
We used to have a demo to test these things in https://github.com/jdibenes/hl2ss/blob/objload/viewer/unity_sample_twin_mesh.py to scan a room, upload the mesh to Unity on the HoloLens, and evaluate how the scan aligns with the real room. It might be useful.
@jdibenes
Thank you for the details. This has helped me solve my problem. Out of curiosity, where are the extrinsics and intrinsics parameters obtained from? Is the hl2ss package accessing these from some sort of calibration file that is stored on the HoloLens from the factory?
For the Research Mode sensors:
The Extrinsics are obtained directly using the Research Mode API https://github.com/jdibenes/hl2ss/blob/8595431c24960ab5ff54e810c9997d4befc7f9b4/hl2ss/hl2ss/research_mode.cpp#L299-L319
For the cameras (VLC, Long Throw, AHAT), the images have some lens distortion so the Research Mode API does not give the intrinsic parameters ($K$) directly. Instead, it provides a method to convert image points to normalized coordinates (inverse $K$ if the images had no distortion) and another method to convert camera points to image points. Then, hl2ss uses these two methods to generate an image undistort map (preserving resolution and all valid pixels) and corresponding intrinsic parameters. https://github.com/jdibenes/hl2ss/blob/8595431c24960ab5ff54e810c9997d4befc7f9b4/hl2ss/hl2ss/research_mode.cpp#L189-L296