关于数据预处理
@yongxuUSTC @qiuqiangkong 您好,我在看mixture2clean_dnn代码的时候,有一些困惑。 1.pach_features当中的参数n_hop的作用是?为什么要跳过几帧呢? 2.这样跳过之后,平行语料在训练时候的对应关系是什么样的呢? 期待您的回复,谢谢!
您好,n_hop一般选1是最好的,可以充分利用数据。但考虑到内存容量有限,一般把n_hop设的大一点。
Best wishes,
Qiuqiang
From: only-yipie [email protected] Sent: 21 February 2019 10:37 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: [yongxuUSTC/sednn] 关于数据预处理 (#26)
@yongxuUSTChttps://github.com/yongxuUSTC @qiuqiangkonghttps://github.com/qiuqiangkong 您好,我在看mixture2clean_dnn代码的时候,有一些困惑。 1.pach_features当中的参数n_hop的作用是?为什么要跳过几帧呢? 期待您的回复,谢谢!
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ybastzK7rP39Ucm1W95auAn6iHxzks5vPnb_gaJpZM4bHIwA.
@qiuqiangkong @yongxuUSTC 感谢您的回复,还有个问题。 1.在训练时候,比如连续输入7帧,其对应的y标签是,中间那一帧的特征么?如果是中间那一帧,那么为什么在代码中是 n_concat - 1而不是加1呢? Cut target spectrogram and take the center frame of each 3D segment. speech_x_3d = mat_2d_to_3d(speech_x, agg_num=n_concat, hop=n_hop) y = speech_x_3d[:, (n_concat - 1) / 2, :] y_all.append(y)
2.预处理时候为什么要先把音频文件处理成csv格式的文件,而不是直接从音频文件提取特征呢?
csv文件存放的是标签,不是音频文件
Best wishes,
Qiuqiang
From: only-yipie [email protected] Sent: 22 February 2019 06:27 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)
@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复,还有一个问题。预处理时候为什么要先把音频文件处理成csv格式的文件,而不是直接从音频文件提取特征呢?
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466288663, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yZN5uR0_GHVDQjOwzhHbUW8pKi0lks5vP43WgaJpZM4bHIwA.
@qiuqiangkong 感谢您的回复,不好意思,我遇到了一个问题 用stoi方法评估语音质量时,干净语言和增强语言的长度不一样,这个时候应该怎么处理呢? 感谢!
长度不一样很有可能是切帧的时候丢弃了一些采样点。只需要在增强的语音补0使得与clean音频一样长度即可。
Best wishes,
Qiuqiang
From: only-yipie [email protected] Sent: 24 February 2019 09:30 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)
@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复,不好意思,我遇到了一个问题 用stoi方法评估语音质量时,干净语言和增强语言的长度不一样,这个时候应该怎么处理呢? 感谢!
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466754309, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ySID-TF5gsz5i6qHNZnYJyWNvZbQks5vQlufgaJpZM4bHIwA.
Hi,
长度不一样,应该是有可能的,但不会差太多,顶多是结尾的地方少一些samples。评估STOI的时候,你只要 取一个最短的 x(:min_len), y(:min_len),应该就可以
徐勇
On Sun, 24 Feb 2019 at 02:48, qiuqiangkong [email protected] wrote:
长度不一样很有可能是切帧的时候丢弃了一些采样点。只需要在增强的语音补0使得与clean音频一样长度即可。
Best wishes,
Qiuqiang
From: only-yipie [email protected] Sent: 24 February 2019 09:30 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)
@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复,不好意思,我遇到了一个问题 用stoi方法评估语音质量时,干净语言和增强语言的长度不一样,这个时候应该怎么处理呢? 感谢!
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466754309>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AMt5ySID-TF5gsz5i6qHNZnYJyWNvZbQks5vQlufgaJpZM4bHIwA>.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466762700, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJj0uehxKn3MTpaQsBD04e76SsIvr3Kks5vQm31gaJpZM4bHIwA .
@qiuqiangkong @yongxuUSTC 感谢您的回复 我还遇到了一个问题,在使用自己的数据集训练时候,因为noise和speech的长度不一样,不能加在一块。 但是prepare_data的calculate_mixture_features()方法,里明明存在处理长度不一样的代码。为什么还会遇到这个问题呢?
@qiuqiangkong 您好,麻烦在请问一下,data_generator.py这个文件的作用是?我看了几遍代码并不理解这段代码的作用
while True:
if (self.type == 'test') and (self.te_max_iter is not None):
if iter == self.te_max_iter:
break
iter += 1
if pointer >= n_samples:
epoch += 1
if (self.type) == 'test' and (epoch == 1):
break
pointer = 0
np.random.shuffle(index)
max_iter和pointer的意思是?为什么设置max——iter为100呢?判断self.type = test 的作用是?
期待您的回复,不胜感激!
Hi,
data_generator.py is used for generating mini_batch data for neural network to train.
If there is a large amount of validation data, then validate on all of these data will be slow. So max_iter is used for only validate max_iter mini batches which will be faster.
Best wishes,
Qiuqiang
From: only-yipie [email protected] Sent: 08 March 2019 09:30 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)
@qiuqiangkonghttps://github.com/qiuqiangkong 您好,麻烦在请问一下,data_generator.py这个文件的作用是?我看了几遍代码并不理解这段代码的作用 while True: if (self.type == 'test') and (self.te_max_iter is not None): if iter == self.te_max_iter: break iter += 1 if pointer >= n_samples: epoch += 1 if (self.type) == 'test' and (epoch == 1): break pointer = 0 np.random.shuffle(index) max_iter和pointer的意思是?为什么设置max――iter为100呢? 期待您的回复,不胜感激!
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-470863992, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yfWuxBZDezuZrrlkIw5P8fpuYfGjks5vUi27gaJpZM4bHIwA.
@qiuqiangkong 感谢您的回复,其实我问您刚才那个问题的原因是,我在使用自己的数据集实验的时候,发现了 tr_x.shape[0] < te_x.shape[0]这种情况,怀疑可能是generator的原因,现在看来不是因为generator,您大概能知道这可能是什么原因么?不胜感激!
您好,我们训练时如果是用全部TIMIT数据,tr_x.shape[0]大概是4000多,te_x.shape[0]大概是1000多。原因可能是未是用全部数据训练。
Best wishes,
Qiuqiang
From: only-yipie [email protected] Sent: 08 March 2019 10:19 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)
@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复,其实我问您刚才那个问题的原因是,我在使用自己的数据集实验的时候,发现了 tr_x.shape[0] < te_x.shape[0]这种情况,怀疑可能是generator的原因,现在看来不是因为generator,您大概能知道这可能是什么原因么?不胜感激!
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-470878590, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yexe4-NaA1F4CN8rzLc83FCrVUn8ks5vUjkrgaJpZM4bHIwA.
@qiuqiangkong 您好,1.那tr_x.shape[0]和te_x.shape[0]代表的意义是指总共有多少条语音呢?还是指输入的语音总共有多少帧特征?如果是指输入语音的条数,那么为什么使用mini_data的时候,tr_x.shape[0]和te_x.shape[0]分别是1392和566呢? 2.关于原因可能是未使用全部训练数据。这个调整我是应该在哪里调呢?同样的一份代码,我使用mini_data时候tr_x.shape[0]是大于te_x.shape[0]的,而使用自己数据的时候,是小于的(训练集的语音数量是比测试集语音数量要多的) 期待您的回复,不胜感激!
您好,您的tr_x.shape和te_x.shape打印出来分别是多少?
Best wishes,
Qiuqiang
From: only-yipie [email protected] Sent: 10 March 2019 01:14 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)
@qiuqiangkonghttps://github.com/qiuqiangkong 您好,1.那tr_x.shape[0]和te_x.shape[0]代表的意义是指总共有多少条语音呢?还是指输入的语音总共有多少帧特征?如果是指输入语音的条数,那么为什么使用mini_data的时候,tr_x.shape[0]和te_x.shape[0]分别是1392和566呢? 2.关于原因可能是未使用全部训练数据。这个调整我是应该在哪里调呢?同样的一份代码,我使用mini_data时候tr_x.shape[0]是大于te_x.shape[0]的,而使用自己数据的时候,是小于的(训练集的语音数量是比测试集语音数量要多的) 期待您的回复,不胜感激!
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-471237213, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yU_27uKcwrLiKnriWd8XJriL9iiLks5vVFx0gaJpZM4bHIwA.
@qiuqiangkong 您好,tr_x.shape是(295250,7,257)te_x.shape是(336405,7,257,),在mixture_csv的文件里,train的记录是6001条,test的记录是1651条。 期待您的回复,不胜感激!
您好,在Mini data中,tr_x.shape[0] = 1392 指的是在mini data(可能只有两个音频)一共可以切出1392个大小为(7, 257)的训练样本。如果有6001个音频,那么可以切出295250个大小为(7, 257)的训练样本。
Best wishes,
Qiuqiang
From: only-yipie [email protected] Sent: 11 March 2019 00:43 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)
@qiuqiangkonghttps://github.com/qiuqiangkong 您好,tr_x.shape是(295250,7,257)te_x.shape是(336405,7,257,),在mixture_csv的文件里,train的记录是6001条,test的记录是1651条。 期待您的回复,不胜感激!
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-471371308, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yW9OLmLDmQLwrxAaIrx5yRUEOBQeks5vVaaqgaJpZM4bHIwA.
您好,在Mini data中,tr_x.shape[0] = 1392 指的是在mini data(可能只有两个音频)一共可以切出1392个大小为(7, 257)的训练样本。如果有6001个音频,那么可以切出295250个大小为(7, 257)的训练样本。 Best wishes, Qiuqiang … ________________________________ From: only-yipie [email protected] Sent: 11 March 2019 00:43 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26) @qiuqiangkonghttps://github.com/qiuqiangkong 您好,tr_x.shape是(295250,7,257)te_x.shape是(336405,7,257,),在mixture_csv的文件里,train的记录是6001条,test的记录是1651条。 期待您的回复,不胜感激! ― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#26 (comment)>, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yW9OLmLDmQLwrxAaIrx5yRUEOBQeks5vVaaqgaJpZM4bHIwA.
好的,我再仔细看看,感谢您的回复