FloorNet
FloorNet copied to clipboard
Question about metadata
Where can we find the definitions of 'topDownTransformation', 'topDownViewAngle', 'videoOrientation' included in "metadata.t7", so that we could attempt to run FloorNet on data that we acquire on our own?
- Do we correctly understand from issue #4 that 'topDownViewAngle' is a rotation angle about the Z' axis?
- Do we correctly infer that 'videoOrientation' might take valuies 1 or 2 regarding if the video is in, respectively, landscape or portrait orientation?
- What about the topDownTransformation? Where is the information that it contains?
Thank you for your interests in our work.
-
Yes, topDownViewAngle is the rotation angle along the Z axis, which could rotate the point cloud to align with the Manhattan world.
-
I guess you don't need to wrong about videoOrientation (always put 1). I don't know why but the video in the Tango data is often rotated. So we manually rotate each video to the right orientation. You don't need to worry about landscape or portrait as long as the video is not upside down.
-
TopDownTransformation equals to the multiplication between a scaling matrix [[scale, 0, 0], [0, scale, 0], [0, 0, 1]] to scale the point cloud such that X and Y are between 0, 1 (aspect preserved) and the rotation matrix given by topDownViewAngle.
Thank you very much. One more question with respect to TopDownTransformation.
For the provided example the corresponding matrix is 3x4:
47.41009728,11.82066487,0., 182.75689221 11.82066487,-47.41009728,0., 341.44276352 0.,0.,0.,1.
According to your previous answer, 3rd element of 3rd row shouldn't be 1 instead of 0? Moreover 4th column corresponds to some kind of transalation or something else?
Oh, yes. The fourth column is the translation which I forgot to mention. Basically the equation is xR(x - t). Indeed, the 3rd element of the 3rd row should be the scaling factor. But here the way we dealt with the z-axis is a bit messy. Since we didn't save the transformation matrix properly (missing the scaling factor for z-axis) and later we wanted to apply the same scaling factor to all axes, we decompose the transformation to use only the rotation and translation part and re-compute the scaling factor. Please see the code here for details. Thanks for bringing up this issue!
Hello again. During more thorough investigation of the code I realized that scale of the data to 0-1 is actually happened within the following code snippet. Is this correct?
https://github.com/art-programmer/FloorNet/blob/e7bd879df7d7825b1badb09440c85b4f1107a6d6/RecordWriterTango.py#L310-L320
Then what is the purpose of the previous scaling, encoded within the globalTransformation matrix? For your example the scaling factor is 48.86149243. Is there a reason why you have to scale data from meters (as the mesh in tango obj is)? I can't get it because as far as I am concerned data at this scale are never used, because they are immediately re-scaled to 0-1. Nevertheless if I omit such strange scale (by decomposing the matrix) and putting scale=1 to all dimensions, the prediction of the network does not contain any walls (resulting to key error within a dictionary in your code). The same happens for data which I have obtained on my own.
Thanks in advance.