nerfstudio icon indicating copy to clipboard operation
nerfstudio copied to clipboard

Add Orthographic Rendering Support

Open LeaFendd opened this issue 1 year ago • 10 comments

Is your feature request related to a problem? Please describe. At present, nerfstudio does not support orthographic renderring. However, generating orthographic images requires merely producing a set of parallel rays for rendering for NeRF. This capability is essential for various applications but is currently lacking in nerfstudio. I plan to submit a PR for this feature.

Describe the solution you'd like I propose to extend the existing Cameras class by adding an orthographic camera model to CameraType. This model would accept the same camera parameters as the perspective camera model already present in nerfstudio, allowing users to easily switch between perspective and orthographic modes.

LeaFendd avatar Nov 13 '23 07:11 LeaFendd

This sounds awesome, I think it would be a great addition-- if you plan on submitting a PR for this, I'd be happy to help.

Would we also want another export method to generate an ortho photo of the scene?

akristoffersen avatar Dec 03 '23 01:12 akristoffersen

This sounds awesome, I think it would be a great addition-- if you plan on submitting a PR for this, I'd be happy to help.

Would we also want another export method to generate an ortho photo of the scene?

Thank you for reply! I'd submitted a PR #2648 . You can generate an ortho photo with the ORTHOPHOTO CameraType like other CameraType, here is a render example with instant ngp. lego-ortho

But I don't how to integrate it into ns-viewer now, I'll try it later.

LeaFendd avatar Dec 05 '23 10:12 LeaFendd

@LeaFendd I am trying to use your ortho camera with SDFStudio, but I get a matrix addition error due to a broadcasting problem at this line : coord_x = (coord_x + 0.5 - self.cx) / scale * self.fx

coord_x shape is [1000,1000], the size of one of my images, while self.cx is [91,1], the number of my images. I cannot understand how this operation can make any sense. To my understanding, we are supposed to get as a result a 2D matrix of size 91 (number of cameras) or 2048 (number of rays in my batch) in the first dimension, to then stack it to get to 3D and finally multiply it with c2w.

Could you please explain me the reasoning behind this operation and how I could make it work?

ArpegorPSGH avatar Dec 13 '23 16:12 ArpegorPSGH

@LeaFendd I am trying to use your ortho camera with SDFStudio, but I get a matrix addition error due to a broadcasting problem at this line : coord_x = (coord_x + 0.5 - self.cx) / scale * self.fx

coord_x shape is [1000,1000], the size of one of my images, while self.cx is [91,1], the number of my images. I cannot understand how this operation can make any sense. To my understanding, we are supposed to get as a result a 2D matrix of size 91 (number of cameras) or 2048 (number of rays in my batch) in the first dimension, to then stack it to get to 3D and finally multiply it with c2w.

Could you please explain me the reasoning behind this operation and how I could make it work?

Sorry, I didn't receive your message in time. This step is used to generate ray_origin. First, a meshgrid is created on the xoy plane, represented by homogeneous coordinates (x, y, z, h). Then, the c2w matrix is applied to transform the meshgrid to the actual camera position. Since the orthographic camera is only used for rendering and not considered for training, I only considered the scenario where cx and cy have a single value when using it. In my usage, cx and cy are both a single Float rather than a vector, which led to a bug in broadcasting. Thank you for your feedback, I will fix it as soon as I can.

LeaFendd avatar Dec 15 '23 08:12 LeaFendd

You're welcome. I definitely think all cameras should be usable for both training and final render, otherwise it would become a bit confusing and go against the "plug'n play" interchangeability allowed by the modularity of the framework.

ArpegorPSGH avatar Dec 15 '23 10:12 ArpegorPSGH

You're welcome. I definitely think all cameras should be usable for both training and final render, otherwise it would become a bit confusing and go against the "plug'n play" interchangeability allowed by the modularity of the framework.

yeah, I'm refactoring a more general version, give me some time

LeaFendd avatar Dec 15 '23 13:12 LeaFendd

How is the refactoring going? I'm working on other parts of my model, but I'll probably be done with them by the end of next week. Do you think your new version will be available by then?

ArpegorPSGH avatar Jan 04 '24 16:01 ArpegorPSGH

How is the refactoring going? I'm working on other parts of my model, but I'll probably be done with them by the end of next week. Do you think your new version will be available by then?

I just committed the new version in my PR #2648

LeaFendd avatar Jan 05 '24 02:01 LeaFendd

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

hot-dog avatar Feb 08 '24 03:02 hot-dog

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

LeaFendd avatar Feb 08 '24 12:02 LeaFendd

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

@LeaFendd Thank you for your reply. Following your suggestion, i construct orghographic projection matrix as follows:

def projection_matrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * zfar
    bottom = -top
    right = tanHalfFovX * zfar
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

And the rendered result is as follows: image The result is orthographic since building's facade is invisible,but it is foggy, i think it is due to the lack of depth info, am i right and any suggestion to solve this? Than you!

hot-dog avatar Feb 11 '24 03:02 hot-dog

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

@LeaFendd Thank you for your reply. Following your suggestion, i construct orghographic projection matrix as follows:

def projection_matrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * zfar
    bottom = -top
    right = tanHalfFovX * zfar
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

And the rendered result is as follows: image The result is orthographic since building's facade is invisible,but it is foggy, i think it is due to the lack of depth info, am i right and any suggestion to solve this? Than you!

Setting bigger z_near may help, i think. BTW, Happy Chinese New Year! :)

LeaFendd avatar Feb 13 '24 06:02 LeaFendd

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

@LeaFendd Thank you for your reply. Following your suggestion, i construct orghographic projection matrix as follows:

def projection_matrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * zfar
    bottom = -top
    right = tanHalfFovX * zfar
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

And the rendered result is as follows: image The result is orthographic since building's facade is invisible,but it is foggy, i think it is due to the lack of depth info, am i right and any suggestion to solve this? Than you!

Setting bigger z_near may help, i think. BTW, Happy Chinese New Year! :)

HAHA, happy chinese new year! I have tried setting bigger z_near(eg z_near=60), but the result is the same, z_near and z_far seems not being used in the process of 3dgs.

hot-dog avatar Feb 17 '24 02:02 hot-dog