nalgebra Numeric bug on perspective matrix computation

trafficstars

I think there is a numeric bug in the way nalgebra computes its perspective calculation.

I was trying to render an object through both raytracing and perspective projection and i was getting this:

The perspectives clearly did not lign up. I spent hours trying to figure out why my math was wrong, until i eventually decided to just copy paste and transpile a c++ perspective camera projection code I own.

With it I get this result:

Which matches my expectation.

With my code I get this perspective projeciton matrix:

  ┌                                         ┐
  │ 2.4142134         0         0         0 │
  │         0 2.4142134         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘

Nalgebra on the other hand returns this for the exact same parameters:

  ┌                                         ┐
 │ 2.3306494         0         0         0 │
 │         0 1.8304877         0         0 │
 │         0         0  -1.00001 -0.200001 │
 │         0         0        -1         0 │
 └                                         ┘

As you can see they are clearly different.

This is my code:

fn perspective(fov : Float, aspect : Float, z_near : Float, z_far : Float) -> Mat4
{
  debug_assert!(aspect > Float::MIN_POSITIVE, "");

  let tan_half_fovy = Float::tan(fov / 2.0);

  let mut result = Mat4::zeros();
  result[(0, 0)] = 1.0 / (aspect * tan_half_fovy);
  result[(1, 1)] = 1.0 / (tan_half_fovy);
  result[(2, 2)] = -(z_far + z_near) / (z_far - z_near);
  result[(3, 2)] = -1.0;
  result[(2, 3)] = -(2.0 * z_far * z_near) / (z_far - z_near);
  return result;
}

And this is nalgebra's implementation:

pub fn perspective_rh_no<T: RealNumber>(aspect: T, fovy: T, near: T, far: T) -> TMat4<T> {
    assert!(
        !relative_eq!(far - near, T::zero()),
        "The near-plane and far-plane must not be superimposed."
    );
    assert!(
        !relative_eq!(aspect, T::zero()),
        "The aspect ratio must not be zero."
    );

    let negone = -T::one();
    let one = T::one();
    let two: T = crate::convert(2.0);
    let mut mat = TMat4::zeros();

    let tan_half_fovy = (fovy / two).tan();

    mat[(0, 0)] = one / (aspect * tan_half_fovy);
    mat[(1, 1)] = one / tan_half_fovy;
    mat[(2, 2)] = -(far + near) / (far - near);
    mat[(2, 3)] = -(two * far * near) / (far - near);
    mat[(3, 2)] = negone;

    mat
}

Notice that the code is attempting to compute basically the same things. I don't know which operation is the cause of the discrepancy but something in nalgebra's code is not working properly.

You can verify the computations by using these input parameters:

width 800
height 800
z_near 0.1
z_far 20 000
fov 45

Sep 15 '22 02:09 Makogan

Please provide the exact input for both algorithms. They do not have a width and a height input, and the (0,0) and (1,1) values should clearly be identical when aspect is 1.0. I suspect an error on the user side, you must be passing incorrect parameters in.

Note that the order of parameters differs between your method and nalgebra's method. Are you sure you provide arguments in the correct order?

Sep 15 '22 09:09 Finomnis

My suspicion is confirmed by this minimal example:

use std::f32::consts::PI;

use nalgebra_glm::{perspective_rh_no, Mat4};

fn perspective(fov: f32, aspect: f32, z_near: f32, z_far: f32) -> Mat4 {
    debug_assert!(aspect > f32::MIN_POSITIVE, "");

    let tan_half_fovy = f32::tan(fov / 2.0);

    let mut result = Mat4::zeros();
    result[(0, 0)] = 1.0 / (aspect * tan_half_fovy);
    result[(1, 1)] = 1.0 / (tan_half_fovy);
    result[(2, 2)] = -(z_far + z_near) / (z_far - z_near);
    result[(3, 2)] = -1.0;
    result[(2, 3)] = -(2.0 * z_far * z_near) / (z_far - z_near);
    return result;
}

fn main() {
    let width = 800.0;
    let height = 800.0;
    let z_near = 0.1;
    let z_far = 20000.0;
    let fov = 45.0 / 180.0 * PI;

    let aspect = width / height;

    println!(
        "Yours (correct): {}",
        perspective(fov, aspect, z_near, z_far)
    );
    println!(
        "Yours (incorrect): {}",
        perspective(aspect, fov, z_near, z_far)
    );

    println!(
        "nalgebra's (incorrect): {}",
        perspective_rh_no(fov, aspect, z_near, z_far)
    );
    println!(
        "nalgebra's (correct): {}",
        perspective_rh_no(aspect, fov, z_near, z_far)
    );
}

Yours (correct): 
  ┌                                         ┐
  │ 2.4142134         0         0         0 │
  │         0 2.4142134         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘


Yours (incorrect): 
  ┌                                         ┐
  │ 2.3306494         0         0         0 │
  │         0 1.8304877         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘


nalgebra's (incorrect): 
  ┌                                         ┐
  │ 2.3306494         0         0         0 │
  │         0 1.8304877         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘


nalgebra's (correct): 
  ┌                                         ┐
  │ 2.4142134         0         0         0 │
  │         0 2.4142134         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘

Sep 15 '22 10:09 Finomnis

What seems to have happened is, glm's order of parameters is this:

perspectiveRH_NO(T fovy, T aspect, T zNear, T zFar) https://github.com/g-truc/glm/blob/b3f87720261d623986f164b2a7f6a0a938430271/glm/ext/matrix_clip_space.inl#L248

Where nalgebra's is:

perspective_rh_no<T: RealNumber>(aspect: T, fovy: T, near: T, far: T)

The aspect and fov parameters are permuted. So a user porting code from C++ to rust or following a C++ tutorial in rust can very easily end up passing the incorrect parameter, since it is reasonable for a user to expect glm::nalgebra to have the same parameters as C++ glm.

Indeed that was exactly the case here, I had ported code from C++ to rust and since things compiled and seemed to work (the images I was generating were very plausible) it did not occur to me that the order of parameters in glm:nalgebra were different.

I do not know why the order of parameters is different or whether it can be justified to have the same order as glm, but I expect this difference to cause confusion in the future.

Sep 15 '22 20:09 Makogan

Very unfortunate situation for something that was preventable by comparing the 2 APIs on the first release but I understand.

Sep 15 '22 21:09 Makogan

All of that is true, and I am not indifferent to the value provided by the nalgebra project. However the absolutely great help that nalgebra is does not unmake the issue. Right now this API is broken at the specification level, it promises to comply with a different API it does not actually comply with in this instance.

Correcting the issue implies a compilation time break for a great number of projects already using nalgebra, an undeniable problem. But I do want to point out, the project already has a breaking change, the API specification is broken atm. I understand the decision to keep the code in its current state.

Since I am in no position to directly change this particular problem, all I can do is point it out. This is an API bug in the code, it will cause issues to many people in the future, I personally would add a breaking change to fix it. Other than expressing my personal and individual opinion on this there is nothing to do, it's just feedback.

I appreciate the work done on nalgebra, it is undoubtedly a greatly useful tool, me and many others are lucky and should be grateful it exists. This particular problem is an unfortunate (seemingly permanent issue), in what is otherwise a great tool.

If the decision is to keep the bug since it will not cause problems for those already using the code I understand. It makes sense, it;s a reasonable decision even if I fundamentally disagree with it.

Have a good day and thank you for contributing to this project.

Sep 15 '22 23:09 Makogan

As a sidenote, they seem to be aware of this issue.

They added an explicit warning to the documentation: https://docs.rs/nalgebra-glm/latest/nalgebra_glm/fn.perspective_rh_no.html#important-note

Sep 16 '22 00:09 Finomnis

It seems this is not the first time this was brought up: https://github.com/dimforge/nalgebra/issues/423

Might be worth reopening or creating a new issue to remind them of fixing it at the next major version.

Sep 16 '22 00:09 Finomnis

I think you misunderstand :)

I'm sure they would acknowledge that this is a mistake and would fix it (I'm not an nalgebra contributor), but they literally can't. The semver compatibility does not allow it. (until the next breaking version, of course)

I'm sorry if I came across too harsh, and I'm sorry if you wasted time on hunting this problem. But please either open an issue about it that suggests the api change, or accept it. Passive aggressively implying that the people who created this library were too stupid to compare APIs is disrespectful.

Again, I'm not an nalgebra dev, so this is mostly based on assumptions. Maybe open an issue that explicitly suggests changing the api to match glm.

I am not implying the devs are stupid. The factual situation is that, atm nalgebra and glm have different parameter ordering for the same function. This is something that in an ideal world, would have been caught on by peer reviewing and would not have been deployed. It was deployed, this suggests that something in the peer review process failed. From the links you shared what failed was making a wrong call the first time the error was caught.

Note that saying that a choice was wrong, or that the peer review failed does not mean the devs are stupid, it means something went wrong. Something goes wrong every day on every single project ever made. Pointing it out is not an insult to the devs, it's just that, pointing out that "Hey the code right here has this property that is kinda bad and undesirable and I would benefit if it was fixed but I understand if you can;t for whatever reason."

Sep 16 '22 00:09 Makogan

@Makogan I deleted my comments as they were not relevant for the improvement of this crate.

Sep 16 '22 11:09 Finomnis

This is something that in an ideal world, would have been caught on by peer reviewing and would not have been deployed. It was deployed, this suggests that something in the peer review process failed. From the links you shared what failed was making a wrong call the first time the error was caught.

This isn’t a peer-reviewing problem. nalgebra existed way before nalgebra-glm and was shipped with its current argument order. Then nalgebra-glm was developped, and, to avoid confusion for people familiar with nalgebra , used the same argument order as nalgebra rather than glm. So it was a deliberate choice, maybe not the best one, but it was a choice, and is documented explicitly.

Please keep in mind that API design that makes every single person happy is hard and often impossible.

Addressing this by swapping the arguments on both nalgebra and nalgebra-glm is definitely possible. It just requires a version bump, which isn’t that big of a deal, especially since these two crates haven’t reached 1.0.0 yet. The main issue is to find a proper way of making the transition without silently breaking user code.

Oct 26 '22 08:10 sebcrozet

nalgebra nalgebra copied to clipboard

Numeric bug on perspective matrix computation

nalgebra
nalgebra copied to clipboard