jianmanLin comments

Results 17 comments of


                                            jianmanLin

Can you provide the processed data or the related processing code?

> @DavidKong96 I guess authors might use dlib to extract the landmarks > > The partial landmarks are defined in their [dataloader](https://github.com/sstzal/DiffTalk/blob/d34480f24408827c7535cd4b653bb1ebdba981b2/ldm/data/talk_data_ref_smooth.py#L65): > > ``` > landmarks_img = landmarks[13:48] >...

Can you provide the processed data or the related processing code?

> before obtain landmark, we need to detect the facial RoI advance. But when the model can not detete the face, how to obtain the landmark? we use dlib to...

Inference question

I also encountered this problem. This is because the model parameters given by the author only include encoder-decoder. The complete model is too large. I saved 8.2G after training.

channel error

![image](https://github.com/sstzal/DiffTalk/assets/101717837/fb11af43-ace4-489a-93e6-2a0bf948c09a) 输入音频经过DeepSpeech和窗口化处理后输出（-1,16,29）作者所给的代码也是接受这个维度的，可以成功推过self.cond_stage_model_for_audio这个网络，但是无法成功推理过self.cond_stage_model_for_audio_smooth网络，可以猜测self.cond_stage_model_for_audio_smooth网络的输出为（-1,32），因为我想跑通整个网络，所以随机初始化了（-1,32）的张量做为音频输出，进行后续的推理，但是后面还是遭遇到了维度不一致问题

channel error

> 我也碰到了一样的错误，不知道是不是由于我处理音频特征的方式有问题导致的你好，请问你是如何处理音频特征的呢，我是通过作者引用的VOCA的那一篇论文，提取出的（N，16, 29）维度的音频特征

channel error

> 我也碰到了一样的错误，不知道是不是由于我处理音频特征的方式有问题导致的我对这个问题感到很奇怪，他有一步c = c.reshape(-1,16,29)操作，就是默认了输入维度是（-1,16,29）

channel error

> > 我也碰到了一样的错误，不知道是不是由于我处理音频特征的方式有问题导致的 > > 我对这个问题感到很奇怪，他有一步c = c.reshape(-1,16,29)操作，就是默认了输入维度是（-1,16,29）后续通过网络不应该报错的才对

channel error

> > > > 只能等作者后续的修正了，可能是某几步的参数填错了导致的老师让我在近期复现这篇论文的baseline，我现在不知道该怎么做了

channel error

> 我也碰到了一样的错误，不知道是不是由于我处理音频特征的方式有问题导致的应该是我们的音频处理方式不对，音频部分最终输出应该是（B， 64）的，这样整个模型就可以跑通了

channel error

> > > > > > > > > > > > > 只能等作者后续的修正了，可能是某几步的参数填错了导致的 > > > > > > 老师让我在近期复现这篇论文的baseline，我现在不知道该怎么做了 > > 刚刚讨论了一下，它代码里attnet的seq_len设置的是8，有可能是作者选取了8个16*29的特征作为这一帧图片对应的音频特征你说的对，谢谢解答