HoloLens2-ResearchMode-Unity How to transform a point from depth camera space to world space?

Thanks for this project! It is great and I got this work on HoloLens2 ( through Unity2021.3.8f1c1) successfully!

Now I want to implement a function that receive a (u,v) point on the image coordinate system and transform it to a (X,Y,Z) point on the Unity/HoloLens coordinate system with the long throw depth information, to put it simply, its a coordinate transformation. I'm new to C++ so I tried to imitate petergu's code, my code which added to the HL2ReasearchMode.cpp are:

// "OpenStream" and "GetNextBuffer" calls are blocking, each sensor frame loop should be run on its own thread,
// This allows sensors to be processed on their own frame rate
com_array<float> HL2ResearchMode::StartuvToWorld(array_view<float const> const& UV)
{
     float xyz[3] = { 0,0,0 };
     float uv[2] = { UV[0], UV[1] };
     
     if (m_refFrame == nullptr)
     {
         m_refFrame = m_locator.GetDefault().CreateStationaryFrameOfReferenceAtCurrentLocation().CoordinateSystem();
     }

     std::thread m_puvToWorldThread(HL2ResearchMode::uvToWorld, this, std::ref(uv), std::ref(xyz));
     m_puvToWorldThread.join();

     com_array<float> XYZ = { xyz[0], xyz[1], xyz[2] };
     return XYZ;
}

//input: pHL2ResearchMode uv[2] output: XYZ[3]
void HL2ResearchMode::uvToWorld(HL2ResearchMode* pHL2ResearchMode, float(&uv)[2], float(&XYZ)[3])
{
    //open
    pHL2ResearchMode->m_longDepthSensor->OpenStream();

    //access to the interface "IResearchModeSensorFrame"
    IResearchModeSensorFrame* pDepthSensorFrame = nullptr;
    pHL2ResearchMode->m_longDepthSensor->GetNextBuffer(&pDepthSensorFrame);

    //get resolution
    ResearchModeSensorResolution resolution;
    pDepthSensorFrame->GetResolution(&resolution);
    pHL2ResearchMode->m_longDepthResolution = resolution;

    //access to the depth camera interface "IResearchModeSensorDepthFrame"
    IResearchModeSensorDepthFrame* pDepthFrame = nullptr;
    winrt::check_hresult(pDepthSensorFrame->QueryInterface(IID_PPV_ARGS(&pDepthFrame)));

    //resolution trans , not sure 1920x1080
    float uv_d[2] = { 0, 0 };
    if (!(resolution.Width == 1920 && resolution.Height == 1080))
    {
        uv_d[0] = uv[0] * (resolution.Width / 1920);
        uv_d[1] = uv[1] * (resolution.Height / 1080);
    }
    else
    {
        std::copy(std::begin(uv), std::end(uv), std::begin(uv_d));
    }

    //get buffer
    size_t outBufferCount = 0;
    const UINT16* pDepth = nullptr;
    const BYTE* pSigma = nullptr;
    pDepthFrame->GetSigmaBuffer(&pSigma, &outBufferCount);
    pDepthFrame->GetBuffer(&pDepth, &outBufferCount);
    pHL2ResearchMode->m_longDepthBufferSize = outBufferCount;

    //timestamp
    UINT64 lastTs = 0;
    ResearchModeSensorTimestamp timestamp;
    pDepthSensorFrame->GetTimeStamp(&timestamp);
    lastTs = timestamp.HostTicks;

    //coordinate transformation
    Windows::Perception::Spatial::SpatialLocation transToWorld = nullptr;
    auto ts = PerceptionTimestampHelper::FromSystemRelativeTargetTime(HundredsOfNanoseconds(checkAndConvertUnsigned(timestamp.HostTicks)));
    transToWorld = pHL2ResearchMode->m_locator.TryLocateAtTimestamp(ts, pHL2ResearchMode->m_refFrame);
    XMMATRIX depthToWorld = XMMatrixIdentity();
    depthToWorld = pHL2ResearchMode->m_longDepthCameraPoseInvMatrix * SpatialLocationToDxMatrix(transToWorld);

    auto idx = (UINT16)uv_d[0] + (UINT16)uv_d[1] * resolution.Width;          // the xth pixel
    UINT16 depth = pDepth[idx];
    depth = (pSigma[idx] & 0x80) ? 0 : depth - pHL2ResearchMode->m_depthOffset;

    float xy[2] = { 0, 0 };
    pHL2ResearchMode->m_pLongDepthCameraSensor->MapImagePointToCameraUnitPlane(uv_d, xy);
    auto pointOnUnitPlane = XMFLOAT3(xy[0], xy[1], 1);
    auto tempPoint = (float)depth / 1000 * XMVector3Normalize(XMLoadFloat3(&pointOnUnitPlane));

    //get the target point coordinate
    auto pointInWorld = XMVector3Transform(tempPoint, depthToWorld);
    XYZ[0] = XMVectorGetX(pointInWorld);
    XYZ[1] = XMVectorGetY(pointInWorld);
    XYZ[2] = -XMVectorGetZ(pointInWorld);

    //Release and Close
    if (pDepthFrame)
    {
        pDepthFrame->Release();
    }
    if (pDepthSensorFrame)
    {
        pDepthSensorFrame->Release();
    }
    pHL2ResearchMode->m_longDepthSensor->CloseStream();
    pHL2ResearchMode->m_longDepthSensor->Release();
    pHL2ResearchMode->m_longDepthSensor = nullptr;

}

The StartuvToWorld func above worked for the first call but got an error indicator on the pHL2ResearchMode->m_longDepthSensor->OpenStream() for the second call and the HoloLens2 app crashed, and Visual Studio output were Exception thrown at 0x00007FFE74686888 (HL2UnityPlugin.dll) in My_Project.exe: 0xC0000005: Access violation reading location 0x0000000000000000 Can anyone help to solve this? Thanks a lot!

Mar 14 '23 07:03 Tolerm

I don't have a comprehensive answer for you but the short answer is, the sensor takes a while to open and close (say 1-2 seconds, but I never tested it), so if you call your uvToWorld in each frame, there is obviously a problem there. You may want to let the sensor keep running in the background and figure out someway to query the uv by another function, for example via shared variable to let the thread know which uv you want to back-project and then save the result somewhere you can access from main thread. Your uv is basically the i and j in LongDepthSensorLoop. Hope that helps.

Mar 24 '23 21:03 petergu684

@petergu684 Thanks for your answer! I have set a boolean to control OpenStream() just run once and get the buffer every call, and it works! Although the result of the coordinate transformation now has obvious errors, I will manage to solve it next! ( maybe it's caused by that the input point comes from the MixedRealityCapture video stream via WIndowsDevicePortal is 1280x720 , and the LongThrow mode's resolution is 320x288, and my resolution transformation for this is wrong) Thanks again for this repo and your reply!

Mar 27 '23 03:03 Tolerm

@Tolerm how did you get this working? when i try it tells me i overloded "std::invoke'. Was there something special you did with your header file?

Nov 02 '23 22:11 EliasMosco

@EliasMosco Are you talking the OpenStream() ? I just used a boolean , in HL2ResearchMode.h,

struct HL2ResearchMode : HL2ResearchModeT<HL2ResearchMode>
 {
   ......
  private:
  ...
  std::atomic_bool m_isDepthSensorStreamOpen = true;   // this boolean
  static void myFunc(HL2ResearchMode* pHL2ResearchMode);
  ...
}

and inHL2ResearchMode.cpp, it's like below

void HL2ResearchMode::MyFunc(HL2ResearchMode* pHL2ResearchMode)
{
  if(pHL2ResearchMode->m_isDepthSensorStreamOpen)
  {
    pHL2ResearchMode->m_longDepthSensor->OpenStream();
    pHL2ResearchMode->m_isDepthSensorStreamOpen = false;
  }
  ...
}

it's very simple, you can change it according to your demands. If you are looking for a way to control multi-sensors stream control, the Mirosoft official docs ResearchMode-ApiDoc.pdf and the repo HoloLens2ForCV may help you , they can both be found in https://github.com/microsoft/HoloLens2ForCV

Nov 03 '23 01:11 Tolerm

Hello @Tolerm, did you manage to get it to work after all? If so, do you happen to have a repository? I'm trying to achieve the same thing!

Feb 09 '24 16:02 LydiaYounsi

Hello @Tolerm, did you manage to get it to work after all? If so, do you happen to have a repository? I'm trying to achieve the same thing!

Well, I'm working in the transformation of PV Camera and Depth Camera, but I'm not sure if the method I'm using now is correct, in principle, same cameras and same mode according the ResearchMode ApiDoc, so the transformation in HoloLens2 should be the same as HoloLen1, and the works in HoloLen1 should be valuable. Finally, I mainly referred to the following:

petergu684 's this repo;
HoloLens2ForCV's sample, StreamRecorder https://github.com/microsoft/HoloLens2ForCV/tree/main/Samples/StreamRecorder#stream-recorder-sample
LisaVelten's code, https://github.com/microsoft/HoloLensForCV/issues/119#issuecomment-553098740

if your work is the same as mine, get a (u,v) in PV Camera and transform it to 3D point in Unity's coord system with the depth of this image point, then my method now like :

get a (u,v) in PV Camera coordinate system;
(u,v) multiply by the inversed intrinsics of PV camera, the intrinsics got from camera calibration , and get a (x,y,z) coordinate;
set a vector as (x,y,-1,1)，and this vector multiply by the PVToWorld matrix to get a world coodinate system point (X,Y,Z,W), the PVToWorld matrix can be found in the StreamRecorder project and understand it there
now transform the (X,Y,Z,W) to the (u',v') in the Depth Camera according pertergu684's method , rigid transformation and perspective projection in Camera Model;
get the depth of (u',v') according petergu684's method, and then get the world point in Unity's coordinate system (or other coordinate system)

if your work is to transform a (u,v) point in Depth Camera image coord system -> (X,Y,Z,W) in a given world coord system, you may do it like this:

use the method MapImagePointToCameraUnitPlane(uv,xy)
give a world coord system Windows::Perception::Spatial::SpatialCoordinateSystem (please refer to petergu684's repo) , and get the extrinsic matrix
(x,y,1) * the extrinsic matrix, it's the reverse process of petergu684's work

Feb 10 '24 06:02 Tolerm

@Tolerm Thanks a lot for your answer! Could you provide more details about how you computed the inrtrinsics of the PV camera?

Feb 12 '24 09:02 LydiaYounsi

@Tolerm Thanks a lot for your answer! Could you provide more details about how you computed the inrtrinsics of the PV camera?

I first tried the UndistortedProjectionTransform here, but the matrix I got was like this : [ 992.376 0 0 0; 0 -993.519 0 0; 0 0 1 0; 628.272 378.181 0 1] . I don't understand the meaning of minus sign '-' before 993.519 , and my PV camera always runs under one setting (1280x720, 30fps) , so I think the intrinsics of my PV camera won't change, I finally used the APP 'Camera Calibrator' of Matlab with a calibration checkerboard and get the intrinsic parameters. BTW, run the Stream Recorder app will get some files of PV camera and Depth camera, and there will be extrinsics of PV camera, extrinsics of Long Throw mode Depth Camera ( this extrinsic matrix may be relative to the Rig Coordinate system)

Feb 12 '24 12:02 Tolerm

HoloLens2-ResearchMode-Unity HoloLens2-ResearchMode-Unity copied to clipboard

How to transform a point from depth camera space to world space?

HoloLens2-ResearchMode-Unity
HoloLens2-ResearchMode-Unity copied to clipboard