DLFS_code icon indicating copy to clipboard operation
DLFS_code copied to clipboard

clarification in chap1

Open uballa opened this issue 6 years ago • 1 comments

In the below code, could you clarify why are calculating dLdN when you are not using in subsequent calculations

dLdS = np.ones_like(S)

dSdN = deriv(sigma, N)

dLdN = dLdS * dSdN

dNdX = np.transpose(W, (1, 0))

dLdX = np.dot(dSdN, dNdX)
return dLdX

uballa avatar Dec 06 '19 15:12 uballa

I have the same question: Why element-wise multiplication is applied to calculate dLdN = dLdS*dSdN, rather than matrix multiplication via either np.dot() or np.matmul()?

I assume this is to make the dimensionality of the rest derivatives correct, as shown in the comment following each derivative. But, I'm still confused...

def matrix_function_backward_sum_1(X: ndarray,
                                   W: ndarray,
                                   sigma: Array_Function) -> ndarray:
    '''
    Compute derivative of matrix function with a sum with respect to the
    first matrix input
    '''
    assert X.shape[1] == W.shape[0] # X: (m x n), W: (n x p)

    # matrix multiplication
    N = np.dot(X, W) # N: (m x p)

    # feeding the output of the matrix multiplication through sigma
    S = sigma(N) # S: (m x p)

    # sum all the elements
    L = np.sum(S) # L: a scalar 

    # note: I'll refer to the derivatives by their quantities here,
    # unlike the math where we referred to their function names

    # dLdS - just 1s
    dLdS = np.ones_like(S) # (m x p)

    # dSdN
    dSdN = deriv(sigma, N) # (m x p)
    
    # dLdN (element-wise multiplication)
    dLdN = dLdS * dSdN # (m x p) 

    # dNdX
    dNdX = np.transpose(W, (1, 0)) # (p x n)

    # dLdX
    dLdX = np.dot(dSdN, dNdX) # (m x p) x (p x n) = (m x n)

    return dLdX

hopezh avatar Mar 28 '21 11:03 hopezh