darts icon indicating copy to clipboard operation
darts copied to clipboard

Approximate Architecture Gradient

Open buttercutter opened this issue 4 years ago • 0 comments

I have few questions on the section : Approximate Architecture Gradient in the paper

  1. Why Evaluating the finite difference requires only two forward passes for the weights and two backward passes for α, and the complexity is reduced from O(|α||w|) to O(|α|+|w|) ?
  2. Looking at equation 7, we have a second-order partial derivative which is computationally expensive to compute. To solve this, the finite difference method is used. <-- how is second-order partial derivative related to finite difference method ?
  3. We also note that when momentum is enabled for weight optimisation, the one-step unrolled learning objective in equation 6 is modified accordingly and all of our analysis still applies. <-- How is momentum directly related to the need of applying chain rule to equation 6 ?

buttercutter avatar Jan 14 '21 10:01 buttercutter