ATVGnet icon indicating copy to clipboard operation
ATVGnet copied to clipboard

basic infomation about chinese

Open Adorablepet opened this issue 4 years ago • 12 comments

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

Adorablepet avatar May 21 '20 06:05 Adorablepet

@lelechen63 in lrw_data.py, what's the difference between generating_landmark_lips function and generating_demo_landmark_lips function? One landmark_path is landmark1d, the other is landmark3d. But when training the atnet model, it uses self.lmark_root_path = '../dataset/landmark1d'. I hope that you can explain it. Thanks.

Adorablepet avatar May 27 '20 08:05 Adorablepet

@lelechen63 Could it be understood that these two functions are two methods for extracting landmarks, and in demo.py are selected landmark1d.

Adorablepet avatar May 27 '20 10:05 Adorablepet

@lelechen63 I am a bit confused about landmarks. Does this parameter distinguish between training and testing? Is the PCA the same? U_lrw1.npy belongs to the training set, does the test set also contain a U_lrw1_test.npy? When I looked at the source code, I found that both training and testing used U_lrw1.py.Thanks.

Adorablepet avatar May 29 '20 08:05 Adorablepet

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

The released model is trained on English, but it can be tested on any other language. The reason is that we consider the audio input as 0.04 seconds, which is not sensitive to linguistic(semantic) information about the language type.

lelechen63 avatar May 29 '20 13:05 lelechen63

@lelechen63 in lrw_data.py, what's the difference between generating_landmark_lips function and generating_demo_landmark_lips function? One landmark_path is landmark1d, the other is landmark3d. But when training the atnet model, it uses self.lmark_root_path = '../dataset/landmark1d'. I hope that you can explain it. Thanks.

I will clean the code again this month. I will notify you once I finished it. The main process for the landmark is 2 steps: 1. align the image using affine transformation 2. detect landmark. In the original code, we have the third step: normalize the landmark. But actually we do not need this step.

lelechen63 avatar May 29 '20 13:05 lelechen63

@lelechen63 I am a bit confused about landmarks. Does this parameter distinguish between training and testing? Is the PCA the same? U_lrw1.npy belongs to the training set, does the test set also contain a U_lrw1_test.npy? When I looked at the source code, I found that both training and testing used U_lrw1.py.Thanks.

The PCA for train and test are same. PCA parameters are extracted from train set and can be used for any videos including test set or videos in the wild.

lelechen63 avatar May 29 '20 13:05 lelechen63

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

The released model is trained on English, but it can be tested on any other language. The reason is that we consider the audio input as 0.04 seconds, which is not sensitive to linguistic(semantic) information about the language type.

Regarding your answer, can I understand that the audio is not the same as the lips, in fact, it has nothing to do with the training language, is it related to the model itself?

Adorablepet avatar Jun 01 '20 01:06 Adorablepet

@lelechen63 Could you release the parameters of the training AT-net and VG-net?Otherwise, it is difficult for us to achieve the effect in the paper.Thanks.

Adorablepet avatar Jun 03 '20 08:06 Adorablepet

@lelechen63What are the meanings of new_16_full_gt_train.pkl and region_16_wrap_gt_train2.pkl , can you explain? lrw_data.py is not very clear .Thanks.

Adorablepet avatar Jun 11 '20 09:06 Adorablepet

@lelechen63What are the meanings of new_16_full_gt_train.pkl and region_16_wrap_gt_train2.pkl , can you explain? lrw_data.py is not very clear .Thanks.

臣附议,,,, @lelechen63

liangzz1991 avatar Jun 18 '20 12:06 liangzz1991

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

The released model is trained on English, but it can be tested on any other language. The reason is that we consider the audio input as 0.04 seconds, which is not sensitive to linguistic(semantic) information about the language type.

what is the mean of 0.04?winlen or winstep of ``mfcc?

Adorablepet avatar Jul 22 '20 09:07 Adorablepet

why face normalization is no need? From my point of view, individual face shape are not different, which also contain rotations(raw, yaw, pitch). All of this parameters are not relevant to audio inputs. So I'm wondering why normalization is no need? Hope for your reply^

Owen-Fish avatar Feb 11 '22 07:02 Owen-Fish