pytensor icon indicating copy to clipboard operation
pytensor copied to clipboard

Support for sparse jacobians

Open jessegrabowski opened this issue 5 months ago • 6 comments

Description

I really want to be able to call pt.jacobian and get back a sparse matrix. For very large systems of equations, the jacobian is essentially never going to be dense, so this would lead to huge performance gains.

I talked with @aseyboldt about it informally once, and he thought it was doable. There is a jax package for it, so that's nice. I confess I don't fully understand what these packages are doing under the hood. From the sparsejac function in the linked package:

    This function uses reverse-mode automatic differentiation to compute the
    Jacobian. The `fn` must accept a rank-1 array and return a rank-1 array, and
    the Jacobian should be sparse with nonzero elements identified by `sparsity`.
    Sparsity is exploited in order to make the Jacobian computation efficient.

    This is done by identifying "structurally independent" groups of output
    elements, which is isomorphic to a graph coloring problem. This allows
    project to a lower-dimensional output space, so that reverse-mode
    differentiation can be more efficiently applied.

Seems like we should be able to do that too!

jessegrabowski avatar Jun 26 '25 04:06 jessegrabowski

I played around a bit with adapting the code from sparsejac. Gist here, it seems easy enough?

jessegrabowski avatar Jun 26 '25 10:06 jessegrabowski

This is something where lazy grad ala #788 could be useful, because we can for instance lift the grad_operation (or in this case the sparse_grad) operation above operations like join, so the user doesn't need to artificially define the "distinct" equations ahead of time. It also allows time for pytensor to prune-away zero branches.

This approach may also be combined with our subtensor lift rewrites, like you may make a dense connectivity mask and lift it through the gradient graph, like it is demonstrated in #1471

ricardoV94 avatar Jun 26 '25 11:06 ricardoV94

This is really cool and I think it could be useful in a lot of places!

With the lazy grads I'm still a little worried that we might break gradient graphs in the merge rewrite pass. At at least I think we need to ensure that we rewrite them away before that. But I can see how it would make some things easier here as well.

About he connectivity: I'd be curious how much it would help if we had info about the sparsity patters of some ops. For instance in elemwise or reshape ops there's a lot of sparsity (and of course as always dynamic broadcasting makes everything worse). So maybe there could be an new optional Op method that returns a sparse connectivity pattern for all combinations of input to output?

When we compute the actual jacobian, I think we should leave the choice of forward or reverse mode to the user? But I think that's probably also true for the normal full jacobian...

aseyboldt avatar Jun 30 '25 07:06 aseyboldt

After we looked at it I don't think anything breaks with the lazy rewrites because you have an op that looks like grad(cost, wrt=[x0, x1, x2]) even if later we find x1=x2 you still end up with grad(cost, wrt=[x0, x1, x1]). The merging changes the meaning of the outer parameters not the gradient graph.

The Op creates a sort of functional separation, similar to an inner graph, where the meaning of the outside parameters is irrelevant (until you decide to act on the boundary, which would always require a special rewrite that understands the grad Op).

ricardoV94 avatar Jun 30 '25 08:06 ricardoV94

Do you think lazy grads are a blocker for this, or I can just go for a PR that adds a sparse jacobian function?

jessegrabowski avatar Jun 30 '25 08:06 jessegrabowski

Not a blocker just a force multiplier. Since it's strictly more work, feel free to go ahead with the eager version and we can iterate from there.

ricardoV94 avatar Jun 30 '25 13:06 ricardoV94