sednn 关于数据预处理

@yongxuUSTC @qiuqiangkong 您好，我在看mixture2clean_dnn代码的时候，有一些困惑。 1.pach_features当中的参数n_hop的作用是？为什么要跳过几帧呢？ 2.这样跳过之后，平行语料在训练时候的对应关系是什么样的呢？期待您的回复，谢谢！

Feb 21 '19 10:02 ChangThinkTech

您好，n_hop一般选1是最好的，可以充分利用数据。但考虑到内存容量有限，一般把n_hop设的大一点。

Best wishes,

Qiuqiang

From: only-yipie [email protected] Sent: 21 February 2019 10:37 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@yongxuUSTChttps://github.com/yongxuUSTC @qiuqiangkonghttps://github.com/qiuqiangkong 您好，我在看mixture2clean_dnn代码的时候，有一些困惑。 1.pach_features当中的参数n_hop的作用是？为什么要跳过几帧呢？期待您的回复，谢谢！

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ybastzK7rP39Ucm1W95auAn6iHxzks5vPnb_gaJpZM4bHIwA.

Feb 21 '19 19:02 qiuqiangkong

@qiuqiangkong @yongxuUSTC 感谢您的回复，还有个问题。 1.在训练时候，比如连续输入7帧，其对应的y标签是，中间那一帧的特征么？如果是中间那一帧，那么为什么在代码中是 n_concat - 1而不是加1呢？ Cut target spectrogram and take the center frame of each 3D segment. speech_x_3d = mat_2d_to_3d(speech_x, agg_num=n_concat, hop=n_hop) y = speech_x_3d[:, (n_concat - 1) / 2, :] y_all.append(y)

2.预处理时候为什么要先把音频文件处理成csv格式的文件，而不是直接从音频文件提取特征呢？

Feb 22 '19 06:02 ChangThinkTech

csv文件存放的是标签，不是音频文件

Best wishes,

Qiuqiang

From: only-yipie [email protected] Sent: 22 February 2019 06:27 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复，还有一个问题。预处理时候为什么要先把音频文件处理成csv格式的文件，而不是直接从音频文件提取特征呢？

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466288663, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yZN5uR0_GHVDQjOwzhHbUW8pKi0lks5vP43WgaJpZM4bHIwA.

Feb 22 '19 07:02 qiuqiangkong

@qiuqiangkong 感谢您的回复，不好意思，我遇到了一个问题用stoi方法评估语音质量时，干净语言和增强语言的长度不一样，这个时候应该怎么处理呢？感谢！

Feb 24 '19 09:02 ChangThinkTech

长度不一样很有可能是切帧的时候丢弃了一些采样点。只需要在增强的语音补0使得与clean音频一样长度即可。

Best wishes,

Qiuqiang

From: only-yipie [email protected] Sent: 24 February 2019 09:30 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复，不好意思，我遇到了一个问题用stoi方法评估语音质量时，干净语言和增强语言的长度不一样，这个时候应该怎么处理呢？感谢！

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466754309, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ySID-TF5gsz5i6qHNZnYJyWNvZbQks5vQlufgaJpZM4bHIwA.

Feb 24 '19 10:02 qiuqiangkong

Hi,

长度不一样，应该是有可能的，但不会差太多，顶多是结尾的地方少一些samples。评估STOI的时候，你只要取一个最短的 x(:min_len), y(:min_len)，应该就可以

徐勇

On Sun, 24 Feb 2019 at 02:48, qiuqiangkong [email protected] wrote:

长度不一样很有可能是切帧的时候丢弃了一些采样点。只需要在增强的语音补0使得与clean音频一样长度即可。

Best wishes,

Qiuqiang

From: only-yipie [email protected] Sent: 24 February 2019 09:30 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复，不好意思，我遇到了一个问题用stoi方法评估语音质量时，干净语言和增强语言的长度不一样，这个时候应该怎么处理呢？感谢！

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466754309>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AMt5ySID-TF5gsz5i6qHNZnYJyWNvZbQks5vQlufgaJpZM4bHIwA>.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466762700, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJj0uehxKn3MTpaQsBD04e76SsIvr3Kks5vQm31gaJpZM4bHIwA .

Feb 25 '19 05:02 yongxuUSTC

@qiuqiangkong @yongxuUSTC 感谢您的回复我还遇到了一个问题，在使用自己的数据集训练时候，因为noise和speech的长度不一样，不能加在一块。但是prepare_data的calculate_mixture_features()方法，里明明存在处理长度不一样的代码。为什么还会遇到这个问题呢？

Feb 27 '19 08:02 ChangThinkTech

@qiuqiangkong 您好，麻烦在请问一下，data_generator.py这个文件的作用是？我看了几遍代码并不理解这段代码的作用 while True: if (self.type == 'test') and (self.te_max_iter is not None): if iter == self.te_max_iter: break iter += 1 if pointer >= n_samples: epoch += 1 if (self.type) == 'test' and (epoch == 1): break pointer = 0 np.random.shuffle(index)
max_iter和pointer的意思是？为什么设置max——iter为100呢？判断self.type = test 的作用是？期待您的回复，不胜感激！

Mar 08 '19 09:03 ChangThinkTech

Hi,

data_generator.py is used for generating mini_batch data for neural network to train.

If there is a large amount of validation data, then validate on all of these data will be slow. So max_iter is used for only validate max_iter mini batches which will be faster.

Best wishes,

Qiuqiang

From: only-yipie [email protected] Sent: 08 March 2019 09:30 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 您好，麻烦在请问一下，data_generator.py这个文件的作用是？我看了几遍代码并不理解这段代码的作用 while True: if (self.type == 'test') and (self.te_max_iter is not None): if iter == self.te_max_iter: break iter += 1 if pointer >= n_samples: epoch += 1 if (self.type) == 'test' and (epoch == 1): break pointer = 0 np.random.shuffle(index) max_iter和pointer的意思是？为什么设置max――iter为100呢？期待您的回复，不胜感激！

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-470863992, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yfWuxBZDezuZrrlkIw5P8fpuYfGjks5vUi27gaJpZM4bHIwA.

Mar 08 '19 09:03 qiuqiangkong

@qiuqiangkong 感谢您的回复，其实我问您刚才那个问题的原因是，我在使用自己的数据集实验的时候，发现了 tr_x.shape[0] < te_x.shape[0]这种情况，怀疑可能是generator的原因，现在看来不是因为generator，您大概能知道这可能是什么原因么？不胜感激！

Mar 08 '19 10:03 ChangThinkTech

您好，我们训练时如果是用全部TIMIT数据，tr_x.shape[0]大概是4000多，te_x.shape[0]大概是1000多。原因可能是未是用全部数据训练。

Best wishes,

Qiuqiang

From: only-yipie [email protected] Sent: 08 March 2019 10:19 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复，其实我问您刚才那个问题的原因是，我在使用自己的数据集实验的时候，发现了 tr_x.shape[0] < te_x.shape[0]这种情况，怀疑可能是generator的原因，现在看来不是因为generator，您大概能知道这可能是什么原因么？不胜感激！

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-470878590, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yexe4-NaA1F4CN8rzLc83FCrVUn8ks5vUjkrgaJpZM4bHIwA.

Mar 09 '19 20:03 qiuqiangkong

@qiuqiangkong 您好，1.那tr_x.shape[0]和te_x.shape[0]代表的意义是指总共有多少条语音呢？还是指输入的语音总共有多少帧特征？如果是指输入语音的条数，那么为什么使用mini_data的时候，tr_x.shape[0]和te_x.shape[0]分别是1392和566呢？ 2.关于原因可能是未使用全部训练数据。这个调整我是应该在哪里调呢？同样的一份代码，我使用mini_data时候tr_x.shape[0]是大于te_x.shape[0]的，而使用自己数据的时候，是小于的（训练集的语音数量是比测试集语音数量要多的）期待您的回复，不胜感激！

Mar 10 '19 01:03 ChangThinkTech

您好，您的tr_x.shape和te_x.shape打印出来分别是多少？

Best wishes,

Qiuqiang

From: only-yipie [email protected] Sent: 10 March 2019 01:14 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 您好，1.那tr_x.shape[0]和te_x.shape[0]代表的意义是指总共有多少条语音呢？还是指输入的语音总共有多少帧特征？如果是指输入语音的条数，那么为什么使用mini_data的时候，tr_x.shape[0]和te_x.shape[0]分别是1392和566呢？ 2.关于原因可能是未使用全部训练数据。这个调整我是应该在哪里调呢？同样的一份代码，我使用mini_data时候tr_x.shape[0]是大于te_x.shape[0]的，而使用自己数据的时候，是小于的（训练集的语音数量是比测试集语音数量要多的）期待您的回复，不胜感激！

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-471237213, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yU_27uKcwrLiKnriWd8XJriL9iiLks5vVFx0gaJpZM4bHIwA.

Mar 10 '19 23:03 qiuqiangkong

@qiuqiangkong 您好，tr_x.shape是（295250，7,257）te_x.shape是（336405,7,257,），在mixture_csv的文件里，train的记录是6001条，test的记录是1651条。期待您的回复，不胜感激！

Mar 11 '19 00:03 ChangThinkTech

您好，在Mini data中，tr_x.shape[0] = 1392 指的是在mini data（可能只有两个音频）一共可以切出1392个大小为(7, 257)的训练样本。如果有6001个音频，那么可以切出295250个大小为(7, 257)的训练样本。

Best wishes,

Qiuqiang

From: only-yipie [email protected] Sent: 11 March 2019 00:43 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 您好，tr_x.shape是（295250，7,257）te_x.shape是（336405,7,257,），在mixture_csv的文件里，train的记录是6001条，test的记录是1651条。期待您的回复，不胜感激！

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-471371308, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yW9OLmLDmQLwrxAaIrx5yRUEOBQeks5vVaaqgaJpZM4bHIwA.

Mar 13 '19 23:03 qiuqiangkong

您好，在Mini data中，tr_x.shape[0] = 1392 指的是在mini data（可能只有两个音频）一共可以切出1392个大小为(7, 257)的训练样本。如果有6001个音频，那么可以切出295250个大小为(7, 257)的训练样本。 Best wishes, Qiuqiang … ________________________________ From: only-yipie [email protected] Sent: 11 March 2019 00:43 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26) @qiuqiangkonghttps://github.com/qiuqiangkong 您好，tr_x.shape是（295250，7,257）te_x.shape是（336405,7,257,），在mixture_csv的文件里，train的记录是6001条，test的记录是1651条。期待您的回复，不胜感激！ ― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#26 (comment)>, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yW9OLmLDmQLwrxAaIrx5yRUEOBQeks5vVaaqgaJpZM4bHIwA.

好的，我再仔细看看，感谢您的回复

Mar 14 '19 06:03 ChangThinkTech