darts Approximate Architecture Gradient

Approximate Architecture Gradient

Open buttercutter opened this issue 4 years ago • 0 comments

Why Evaluating the finite difference requires only two forward passes for the weights and two backward passes for α, and the complexity is reduced from O(|α||w|) to O(|α|+|w|) ?
Looking at equation 7, we have a second-order partial derivative which is computationally expensive to compute. To solve this, the finite difference method is used. <-- how is second-order partial derivative related to finite difference method ?
We also note that when momentum is enabled for weight optimisation, the one-step unrolled learning objective in equation 6 is modiﬁed accordingly and all of our analysis still applies. <-- How is momentum directly related to the need of applying chain rule to equation 6 ?

Jan 14 '21 10:01 buttercutter