pennylane
pennylane copied to clipboard
Create `How to develop PennyLane for 4 different ML frameworks` documentation page in the developer hub
Feature details
There are many subtleties that come up when developing the different Machine Learning frameworks that PennyLane supports. It would be great to add a new page in the developer documentation that notes these details.
-
Trainability: how does each framework mark trainability? A) Autograd: no way to mark,
argnum
argument can be passed toautograd.grad
/autograd.jacobian
; PennyLane patches Autograd's NumPy to allow marking tensors trainable viarequires_grad
. B) Torch: tensors can be marked as trainable viarequires_grad
. C) TensorFlow: only tracks trainability if being watched by a gradient tape. Although, there's atrainable
argument that can be passed totf.Variable
, it isTrue
by default and such tensors still require a gradient tape for differentiation. Non-trainable tensors can be created viatf.constant
. D) JAX: no way to mark,argnums
argument can be passed tojax.grad
/jax.jacobian
. JAX creates Tracer objects under the hood for those arrays marked using theargnums
argument. -
Marking trainability is connected to the ways in how the user interface for doing a forward pass and backward pass with the ML framework is: A) Functional: Autograd and JAX provide a functional user interface where the function to be differentiated is passed to
autograd.grad
/autograd.jacobian
&jax.grad
/jax.jacobian
and then the obtained gradient function is called with the input arguments. B) Object-oriented: TensorFlow and PyTorch provide an object-oriented user interface where parameters are first marked as trainable, then the function is called to compute the forward pass and eventually the backward pass can also be computed to obtain gradients. -
Writing tests:
- Naively thinking about each framework first implies creating 4 or more different test cases for each framework. Although one could leverage
pytest
's parameterization, creating separate test cases may still be arguably the cleanest solution because of how different each framework's UI is. - When computing gradients with TensorFlow, Jacobians should only be computed if absolutely necessary for the use case due to a potentially degraded performance with for example
diff_method="parameter-shift"
(seeGotchas
section).
- Gotchas:
- Due to the trainability characteristics of JAX, certain PennyLane features may require an additional
argnum
/argnums
argument. Functions that extract trainability information about some input parameters will require such an argument for JAX support. See the example ofqml.metric_tensor
that would require anargnums
keyword to work without having to call a JAX transform that creates arrays that are perceived as trainable by PennyLane (https://github.com/PennyLaneAI/pennylane/issues/1880). - It was found that TensorFlow's
tape.jacobian
syntax yields poor performance with its defaultexperimental_use_pfor=True
when usingdiff_method="parameter-shift"
(see https://github.com/PennyLaneAI/pennylane/pull/1869). - For higher-order derivatives with TensorFlow,
experimental_use_pfor=False
has to be set for each "internal"tapeN.jacobian
call with the "outermost" call allowing both allowingexperimental_use_pfor=False
orexperimental_use_pfor=True
.
E.g.,
with tf.GradientTape() as tape1:
with tf.GradientTape(persistent=True) as tape2:
res = circuit(x)
g = tape2.jacobian(res, x, experimental_use_pfor=False) # <--- Needs experimental_use_pfor=False
hess = tape1.jacobian(g, x) # <--- Can use experimental_use_pfor=True
Implementation
No response
How important would you say this feature is?
2: Somewhat important. Needed this quarter.
Additional information
No response