Implement Normal, a solver applying an inner solver to the normal equations
This updates #159 to be independent of #158, as requested there.
One question about the implementation of allow_dependent_{rows,columns}: Currently I implement them defensively with regard to the possibility that the solver state might be transposed, taking into account the possibility that the user manually constructs the solver state and transposes it (for instance, maybe they want to efficiently apply a linear solve to many systems, some involving A and some involving A^t). However, thinking more closely about the way the library works, I think they only way they can do this is to directly call compute on the solver state object, and in this usage, they do not get the custom jvp (any gradients computed would be directly differentiating the compute call), so the result of allow_dependent_{rows,columns} is irrelevant. Any calls to the allow_dependent_{rows,columns} in the library itself follow from the jvp of a linear_solve call, which seems to mean in these functions the solver state can be assumed freshly constructed from the given operator.
So is it correct to say that the result of allow_dependent_{rows,columns} can assume the eventually used solver state is fresh, ie unmodified by transpose or conj? In this case, this assumption would allow a more precise and simpler implementation, removing the special case for square operators. If you agree I can make the simplification.
Closing now that #158 is in, we'll merge #159 instead.