Bug in Adam for non-real parameters
Hi,
There appears to be an edge case in the current Adam implementation. If the optimizer contains non-real variables, i.e., quaternions or complex numbers, the variance estimation might not be what we want.
The Adam implementation computes
v_t = self.beta_2 * v_tp + (1 - self.beta_2) * dr.sqr(g_p)
where g_p is the parameter gradient. If a parameter is a quaternion or complex number, the gradient will be of the same type. Then, dr.sqr will compute a quaternion or complex number product. This is different from the element-wise product that we would expect here.
While there is some literature on this subject (pytorch github issue, pytorch docs, https://arxiv.org/pdf/0906.4835.pdf), the easiest practical solution is to just optimize using Vector2f or Vector4f instead.
Maybe Mitsuba should raise an error if a quaternion or complex value is assigned to the optimizer?
Hi @dvicini
I agree with you. My suggestion would be a check to see if the type is special (dr.is_special_v()) and if it is the case raise an error as you said.
I'll leave this open for a few more days before pushing a fix, in case anyone else has some ideas to contribute :)
The same issue would also appear with camera matrices and, e.g., complex IOR values in the next version.
The optimizer will need an extra check like
tp = type(g_p)
if dr.is_special_v(tp):
tp_a = dr.array_t(tp)
g_p = tp_a(tp)
...
v_t = self.beta_2 * v_tp + (1 - self.beta_2) * dr.sqr(g_p)
We could either wait until the next version or backport the dr.array_t type trait.
This has now been fixed in #1423