openvr [documentation] The JSON File camera extrinsics/intrinsics

On my HMD, the Vive Pro 2, it defines both cameras in The JSON File, however, in The JSON File documentation, how to interpret this data is not specified. Additionally, the Vive Pro 2 doesn't seem to fill out some information, like head_from_camera, or extrinsics. It does have dual_extrinsics, but that appears to be the Right->Left camera transform, so I don't know how I'm expected to acquire the camera->head transform, or which set of intrinsics to use, as there's two sets of intrinsics per camera. Any guidance on this would be very helpful, I'm intending to use this information for hand tracking, and I need to be able to parse the raw JSON file, I can't use the OpenVR camera API.

Blurb from my device's JSON config

   "tracked_cameras": [
      {
         "name": "right",
         "intrinsics": {
            "center_x": 305.839755,
            "center_y": 224.239914,
            "distort": {
               "center_x": 0.000000,
               "center_y": 0.000000,
               "coeffs": [
                  0.002501,
                  0.011861,
                  -0.015764,
                  0.007814,
                  0.000000,
                  0.000000,
                  0.000000,
                  0.000000
               ],
               "type": "DISTORT_FTHETA"
            },
            "focal_x": 273.262634,
            "focal_y": 273.139187,
            "width": 612,
            "height": 460
         },
         "extrinsics": [
            0,
            0,
            0,
            1,
            0,
            0,
            0
         ]
      },
      {
         "name": "left",
         "intrinsics": {
            "center_x": 306.560820,
            "center_y": 227.154711,
            "distort": {
               "center_x": 0.000000,
               "center_y": 0.000000,
               "coeffs": [
                  0.011316,
                  -0.015755,
                  0.016185,
                  -0.004577,
                  0.000000,
                  0.000000,
                  0.000000,
                  0.000000
               ],
               "type": "DISTORT_FTHETA"
            },
            "focal_x": 273.987947,
            "focal_y": 273.800299,
            "width": 612,
            "height": 460
         },
         "extrinsics": [
            0,
            0,
            0,
            1,
            0,
            0,
            0
         ]
      }
   ],
   "tracked_camera": {
      "version": "2.1.9.2",
      "head_from_camera": [
         0,
         0,
         0,
         1,
         0,
         0,
         0
      ],
      "pitch": 9.4560720026493073e-03,
      "yaw": 5.0874026492238045e-03,
      "roll": -3.1643894035369158e-03,
      "intrinsics": {
         "width": 612,
         "height": 460,
         "center_x": 3.0656081961763812e+02,
         "center_y": 2.2715471133258939e+02,
         "focal_x": 2.7398794667272523e+02,
         "focal_y": 2.7380029944529866e+02,
         "distort": {
            "center_x": 0.0000000000000000e+00,
            "center_y": 0.0000000000000000e+00,
            "coeffs": [
               1.1315747343058145e-02,
               -1.5754841210760588e-02,
               1.6185229777046253e-02,
               -4.5765631260569284e-03,
               0.0000000000000000e+00,
               0.0000000000000000e+00,
               0.0000000000000000e+00,
               0.0000000000000000e+00
            ],
            "type": "DISTORT_FTHETA"
         }
      }
   },
   "second_tracked_camera": {
      "version": "2.1.9.2",
      "pitch": 9.4496700912714005e-03,
      "yaw": 2.6638885028660297e-03,
      "roll": -3.7033655680716038e-03,
      "intrinsics": {
         "width": 612,
         "height": 460,
         "center_x": 3.0583975506166263e+02,
         "center_y": 2.2423991420760822e+02,
         "focal_x": 2.7326263382506403e+02,
         "focal_y": 2.7313918653975162e+02,
         "distort": {
            "center_x": 0.0000000000000000e+00,
            "center_y": 0.0000000000000000e+00,
            "coeffs": [
               2.5006483855282164e-03,
               1.1861152220993638e-02,
               -1.5764009743348257e-02,
               7.8143189548838306e-03,
               0.0000000000000000e+00,
               0.0000000000000000e+00,
               0.0000000000000000e+00,
               0.0000000000000000e+00
            ],
            "type": "DISTORT_FTHETA"
         }
      }
   },
   "dual_extrinsics": {
      "rotation": [
         9.9996641344077786e-01,
         2.4367993108056988e-03,
         -7.8252156204252580e-03,
         -2.4052079519902353e-03,
         9.9998893046053627e-01,
         4.0439993942251529e-03,
         7.8349834138288724e-03,
         -4.0250422993637390e-03,
         9.9996120528218169e-01
      ],
      "translation": [
         -6.4603780520057867e+01,
         3.0915121309049726e-01,
         -3.2321865621179963e-01
      ],
      "type": "DUAL_CAMERA"
   },

Sep 20 '25 02:09 Beyley

The Vive Pro 2 seems to have a really weird setup. I'm guessing some of the camera parameters are handled by its HTC-made custom driver. The only heading the Valve Index provides is the first "tracked_cameras" one.

There doesn't seem to be enough information there to get the headset to camera transform, but the OpenVR camera API outputs it anyway. It's possible that the driver gets them from somewhere else, or it's hardcoded. There are some screenshots from my users of what the API outputs here: https://forums.unrealengine.com/t/htc-vive-passthrough-camera-ar-vr-development/1186330/30

The intrinsics are also inconsistent between the different headings. Some of them seem to just be arbitrarily divided or multiplied by 100. The plausible ones look like the ones from the tracked_cameras heading, with the focal length and center measured in pixels, and the distortion coefficients likely being the first 4 radial coefficients in the Brown-Conrady lens distortion model.

I'm guessing Valves lighthouse driver could be handling all the image processing and undistortion for the camera API output, using the tracked_cameras data only. Meanwhile the HTC driver uses the rest of the data (and likely other sources), and just fills in the camera to head transform properties.

Interesting that the JSON file has the camera distortion parameters, because they don't get passed to the application-side API at all. This has made it impossible for me to get my app working correctly with the Vive Pro 2.

Hopefully you can get some proper answers. I've been shouting at this brick wall for years now, and not heard a squeak either from Valve or HTC on anything regarding cameras.

Sep 20 '25 11:09 Rectus

I'm guessing Valves lighthouse driver could be handling all the image processing and undistortion for the camera API output, using the tracked_cameras data only. Meanwhile the HTC driver uses the rest of the data (and likely other sources), and just fills in the camera to head transform properties.

I did some reverse-engineering of lighthouse_console last night, and it actually seems to parse out dual_extrinsics, but I didn't look further into the binary past just that alone to see how it actually constructs a pose. So this is definitely something handled on Valve's end too, not just HTC. I'm hoping that there can be clarity given here on how to extract a Camera->Head pose and which intrinsics to use, and I'm hoping I don't have to go full on reverse engineering the binary in question to get that clarity myself, which would be a colossal use of my time.

Sep 21 '25 02:09 Beyley