YuanxunLu

Results 26 comments of YuanxunLu

More clearly, the sec. 3 describes what is the architecture, what it consists of, and how it works, i.e., forward pass. In sec. 4, we describe how to build such...

The proposed model is a person-specific method, and you should train each model for each wild target person and sec. 4. describes this process.

The 'mean_pts' and 'std_mean_pts' may have differences in the first 16 dimensions, which are actually contour points. The 3D tracking algorithm I used applied sliding contour points for higher tracking...

They are parameters of the post-processing (smooth and scale), decided mostly empirically.

I extracted the video at 60 FPS, you can do it simply using FFmpeg. If you change the fps setting, you should consider changing the design of the audio feature...

Whether you need to retrain the audio feature extraction network depends on how you use it. The proposed setting was designed for my experiments. Of course, you can make use...

Focal length, of course, should not be randomly set, and it should work with your tracked 3D face as well as your crop & scaling parameters. You don't need to...

Training the audio2feature model is not hard I believe, as long as you put the input data & groundtruth right. Your description seems alright. The input landmarks should lie in...

You need to split out the head pose influence on the 3d landmarks, your training disentangled landmarks should be something that looks like only the mouth moves while others are...

1. Using 3d landmarks obtained by face tracking has several advantages over directly using detected 2D landmarks. It helps disentangle the camera parameters, head pose, and facial movements, which allow...