ATVGnet icon indicating copy to clipboard operation
ATVGnet copied to clipboard

Confused about "normLmarks" function

Open tlatlbtle opened this issue 6 years ago • 4 comments

Many thanks for this repo. I am trying to reimplement your training process but I am stucked in data preprocessing.

Actually, I am confused about "normLmarks" function.

  1. I wonder that when there only exist one face for one frame ( len(lmarks.shape) == 2 ), will "normLmarks" always output with the same results? I mark related lines in your code with "#". It seems @ssinha89 also found this issue. https://github.com/lelechen63/ATVGnet/issues/17#issuecomment-547884824.

  2. Would you tell me more about the meaning of "init_params", "params" and "predicted"? What does "S" or "SK" mean here? I know you use "procrustes" to align each landmarks to mean face, but I am confused about the process after that. Or could you provide some related papers for how to do that?

def normLmarks(lmarks):
    norm_list = []
    idx = -1
    max_openness = 0.2
    mouthParams = np.zeros((1, 100))
    mouthParams[:, 1] = -0.06
    tmp = deepcopy(MSK)
    tmp[:, 48*2:] += np.dot(mouthParams, SK)[0, :, 48*2:]
    open_mouth_params = np.reshape(np.dot(S, tmp[0, :] - MSK[0, :]), (1, 100))

    if len(lmarks.shape) == 2:
        lmarks = lmarks.reshape(1,68,2)
    for i in range(lmarks.shape[0]):
        mtx1, mtx2, disparity = procrustes(ms_img, lmarks[i, :, :])
        mtx1 = np.reshape(mtx1, [1, 136])
        mtx2 = np.reshape(mtx2, [1, 136])
        norm_list.append(mtx2[0, :])
    pred_seq = []
    init_params = np.reshape(np.dot(S, norm_list[idx] - mtx1[0, :]), (1, 100))
    for i in range(lmarks.shape[0]):
        params = np.reshape(np.dot(S, norm_list[i] - mtx1[0, :]), (1, 100)) - init_params - open_mouth_params
######## "params" will always be equal to  (-open_mouth_params) ######## 
        predicted = np.dot(params, SK)[0, :, :] + MSK
        pred_seq.append(predicted[0, :])
    return np.array(pred_seq), np.array(norm_list), 1

tlatlbtle avatar Nov 08 '19 06:11 tlatlbtle

Many thanks for this repo. I am trying to reimplement your training process but I am stucked in data preprocessing.

Actually, I am confused about "normLmarks" function.

  1. I wonder that when there only exist one face for one frame ( len(lmarks.shape) == 2 ), will "normLmarks" always output with the same results? I mark related lines in your code with "#". It seems @ssinha89 also found this issue. #17 (comment).
  2. Would you tell me more about the meaning of "init_params", "params" and "predicted"? What does "S" or "SK" mean here? I know you use "procrustes" to align each landmarks to mean face, but I am confused about the process after that. Or could you provide some related papers for how to do that?
def normLmarks(lmarks):
    norm_list = []
    idx = -1
    max_openness = 0.2
    mouthParams = np.zeros((1, 100))
    mouthParams[:, 1] = -0.06
    tmp = deepcopy(MSK)
    tmp[:, 48*2:] += np.dot(mouthParams, SK)[0, :, 48*2:]
    open_mouth_params = np.reshape(np.dot(S, tmp[0, :] - MSK[0, :]), (1, 100))

    if len(lmarks.shape) == 2:
        lmarks = lmarks.reshape(1,68,2)
    for i in range(lmarks.shape[0]):
        mtx1, mtx2, disparity = procrustes(ms_img, lmarks[i, :, :])
        mtx1 = np.reshape(mtx1, [1, 136])
        mtx2 = np.reshape(mtx2, [1, 136])
        norm_list.append(mtx2[0, :])
    pred_seq = []
    init_params = np.reshape(np.dot(S, norm_list[idx] - mtx1[0, :]), (1, 100))
    for i in range(lmarks.shape[0]):
        params = np.reshape(np.dot(S, norm_list[i] - mtx1[0, :]), (1, 100)) - init_params - open_mouth_params
######## "params" will always be equal to  (-open_mouth_params) ######## 
        predicted = np.dot(params, SK)[0, :, :] + MSK
        pred_seq.append(predicted[0, :])
    return np.array(pred_seq), np.array(norm_list), 1

Please refer to https://github.com/eeskimez/Talking-Face-Landmarks-from-Speech for audio to landmark part

lelechen63 avatar Nov 20 '19 01:11 lelechen63

Many thanks for this repo. I am trying to reimplement your training process but I am stucked in data preprocessing. Actually, I am confused about "normLmarks" function.

  1. I wonder that when there only exist one face for one frame ( len(lmarks.shape) == 2 ), will "normLmarks" always output with the same results? I mark related lines in your code with "#". It seems @ssinha89 also found this issue. #17 (comment).
  2. Would you tell me more about the meaning of "init_params", "params" and "predicted"? What does "S" or "SK" mean here? I know you use "procrustes" to align each landmarks to mean face, but I am confused about the process after that. Or could you provide some related papers for how to do that?
def normLmarks(lmarks):
    norm_list = []
    idx = -1
    max_openness = 0.2
    mouthParams = np.zeros((1, 100))
    mouthParams[:, 1] = -0.06
    tmp = deepcopy(MSK)
    tmp[:, 48*2:] += np.dot(mouthParams, SK)[0, :, 48*2:]
    open_mouth_params = np.reshape(np.dot(S, tmp[0, :] - MSK[0, :]), (1, 100))

    if len(lmarks.shape) == 2:
        lmarks = lmarks.reshape(1,68,2)
    for i in range(lmarks.shape[0]):
        mtx1, mtx2, disparity = procrustes(ms_img, lmarks[i, :, :])
        mtx1 = np.reshape(mtx1, [1, 136])
        mtx2 = np.reshape(mtx2, [1, 136])
        norm_list.append(mtx2[0, :])
    pred_seq = []
    init_params = np.reshape(np.dot(S, norm_list[idx] - mtx1[0, :]), (1, 100))
    for i in range(lmarks.shape[0]):
        params = np.reshape(np.dot(S, norm_list[i] - mtx1[0, :]), (1, 100)) - init_params - open_mouth_params
######## "params" will always be equal to  (-open_mouth_params) ######## 
        predicted = np.dot(params, SK)[0, :, :] + MSK
        pred_seq.append(predicted[0, :])
    return np.array(pred_seq), np.array(norm_list), 1

Please refer to https://github.com/eeskimez/Talking-Face-Landmarks-from-Speech for audio to landmark part

Thanks. I get the information.

tlatlbtle avatar Dec 04 '19 12:12 tlatlbtle

@wjbKimberly @lelechen63 Hi there, i also encountered the same problem. I want to train the ATNet with my own dataset, the landmark data is preprocessed using code extracted from demo.py, the preprocessed landmark data is all the same. Now what confuse me is that is this normal? If this is not correct, did you solve this problem? Could you give some suggestion where went wrong? Thank you!

hot-dog avatar May 25 '20 03:05 hot-dog

@wjbKimberly @lelechen63 Hi there, i also encountered the same problem. I want to train the ATNet with my own dataset, the landmark data is preprocessed using code extracted from demo.py, the preprocessed landmark data is all the same. Now what confuse me is that is this normal? If this is not correct, did you solve this problem? Could you give some suggestion where went wrong? Thank you!

@hot-dog i find 'example_landmark' nerver change in demo.py when you change template image of input, similar to 'mean value' does not need to change? what should i do when training? but it does not match with picture in paper? confused,,,,,,

liangzz1991 avatar Jun 20 '20 08:06 liangzz1991