jixinya comments

Results 13 comments of


                                            jixinya

The validation loss is rising and fluctuating, is that a regular situation?

> @auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better,...

How is head pose taken into account for VGnet

> > Sorry to bother you again, but I am still a little confused about the preprocessing of ATnet and VGnet. I didn't find explicit code for preprocessing the training...

How is head pose taken into account for VGnet

Thanks! There's one more question. Did you use the method in 'Talking-Face-Landmarks-from-Speech' to frontalize the landmarks in ATnet? Cause when I try normLmark() in demo.py to process the data, I...

M003 results error

I haven't met this problem before, but I guess it might comes from the missing part of ears or neck of the edge map (as shown below). ![vid2vid](https://user-images.githubusercontent.com/34118623/154415945-09706d51-b88c-40dc-a095-92201a5a844c.jpg)

can not download pre-trained model

我这里显示下载没有问题，或许可以再刷新试试看

About Dynamic Timing Warping preprocess mfcc features

I have released the training code. More details of DTW can be seen in train/disenetanglement/dtw/MFCC_dtw.py.

how did you get the data in train.zip？

Hi, we use a landmark detector to detect the 106 facial key points of each frame. However, we can not provide the detector here due to copyright reasons. You can...

About the DECODER used in the Cross-reconstruction Emotion Disentanglement Module

I have released the training code. You can check it in train/disentanglement/code/models_content_cla.py (class Decoder).

The meaning of the output

The output naming rules of M003 are different from others (M009,M030 etc.) due to the update of MEAD dataset. The old rule follows the order of sentences while the new...

eyes blink

The results don't have obvious eye blinks since we did not intentionally add eye blinks in the video.