ad-handbook icon indicating copy to clipboard operation
ad-handbook copied to clipboard

Issue 7/implicit functions

Open IvanYashchuk opened this issue 4 years ago • 5 comments

Hello @charlesm93, here are my friendly suggestions on the section you wrote.

I propose renaming "algebraic equations" -> "nonlinear system" as the same results hold for this more general case (including also PDEs). Implicit function theorem just states that du/dpsi exists under certain conditions, therefore, I renamed the section to "tangent linear method". I think that there is no need to introduce the Lagrangian approach for the adjoint method, as it is naturally derived from the tangent method. I propose to remove that section. In "practical considerations" I have added the expressions needed for the implementation of forward and reverse AD of nonlinear solvers.

Ref. #7

IvanYashchuk avatar Apr 15 '20 09:04 IvanYashchuk

Thanks for the PR!

"Algebraic equations" is how these are typically described in the literature. I'd like to continue to at least connect to the mainstream literature.

Part of the motivation for writing this volume is to provide an accessible presentation. So I'd like to keep the presentations to the simplest possible level rather than the most general mathematical level. I'm just bringing this up as a criterion to consider this PR.

I'll let @charlesm93 respond directly on the content.

bob-carpenter avatar Apr 15 '20 15:04 bob-carpenter

Hi @IvanYashchuk, thank you for chiming in. I'll go through your PR, but first a few comments:

  • Another reason for keeping "algebraic equation" is that the handbook is meant to be usable as a lookup textbook.
  • Statements of the implicit function theorem I've encountered also give you the Jacobian matrix. In any case, to derive the Jacobian, you need u to be a function of psi. Is "tangent linear method" a termed used in the certainf fields?
  • the case of PDEs is different because the target is an integral of f, rather than a function of it. The adjoint method is then more sophisticated.
  • You're right that we don't need to discuss the Lagrangian approach for algebraic equation, but I'm introducing it here for pedagogical reasons. It is easy to understand with algebraic equations and much more difficult for ODEs, DAEs, etc where we will need them.

What I do in this section might make more sense once I've written the other sections on implicit functions.

charlesm93 avatar Apr 15 '20 20:04 charlesm93

Is "tangent linear method" a termed used in the certain fields?

It is a common term, even used in the current draft https://github.com/bob-carpenter/ad-handbook/blob/09c85566defab0c78fb60a7913425166be6b2e0f/index.Rmd#L103

the case of PDEs is different because the target is an integral of f, rather than a function of it.

There are many different PDEs and there are many different objective (target?) functions. AD rules are independent of the form of the objective functions. Stationary PDEs are nonlinear equations of form F(x) = 0, linear stationary PDEs can be written in the same form, time-dependent PDEs are often reduced to a sequence of stationary PDEs.

Alright, I agree that the Lagrangian approach is needed for ODEs, etc. Anyways the actual rules for AD are more important than the derivation of them.

IvanYashchuk avatar Apr 16 '20 06:04 IvanYashchuk

It is a common term,

Point taken.

Adding a summary at the end, as you have done, with only the results plainly stated works well. These sections can be named "tangent linear method" and "adjoint method" (again, as you have done). The first sections, in which the differentiation algorithms are derived, can be named "Implicit function theorem" and "Lagrangian approach". The implicit function theorem allows you to derive the tangent linear and adjoint method.

AD rules are independent of the form of the objective functions

Some properties of the objective function matter. A scalar objective motivates an adjoint method, while a long vector may warrant a tangent method.

Stationary PDEs are nonlinear equations of form F(x) = 0

Isn't an equation of the form F(x) = 0 simply an algebraic equation? To have a PDE, F must also depend on partial derivatives of x. Unless of course you can point me to an example where this is not the case.

Anyways the actual rules for AD are more important than the derivation of them.

It depends on the reader. For Stan developers, being able to extend results to other cases can sometimes be important, as the AD handbook may not be exhaustive.

charlesm93 avatar Apr 16 '20 13:04 charlesm93

I think I've addressed all the needed changes. Rendered latex derivation: https://github.com/bob-carpenter/ad-handbook/pull/8#discussion_r409607668

IvanYashchuk avatar Apr 16 '20 14:04 IvanYashchuk