sednn icon indicating copy to clipboard operation
sednn copied to clipboard

关于数据预处理

Open ChangThinkTech opened this issue 6 years ago • 16 comments

@yongxuUSTC @qiuqiangkong 您好,我在看mixture2clean_dnn代码的时候,有一些困惑。 1.pach_features当中的参数n_hop的作用是?为什么要跳过几帧呢? 2.这样跳过之后,平行语料在训练时候的对应关系是什么样的呢? 期待您的回复,谢谢!

ChangThinkTech avatar Feb 21 '19 10:02 ChangThinkTech

您好,n_hop一般选1是最好的,可以充分利用数据。但考虑到内存容量有限,一般把n_hop设的大一点。

Best wishes,

Qiuqiang


From: only-yipie [email protected] Sent: 21 February 2019 10:37 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@yongxuUSTChttps://github.com/yongxuUSTC @qiuqiangkonghttps://github.com/qiuqiangkong 您好,我在看mixture2clean_dnn代码的时候,有一些困惑。 1.pach_features当中的参数n_hop的作用是?为什么要跳过几帧呢? 期待您的回复,谢谢!

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ybastzK7rP39Ucm1W95auAn6iHxzks5vPnb_gaJpZM4bHIwA.

qiuqiangkong avatar Feb 21 '19 19:02 qiuqiangkong

@qiuqiangkong @yongxuUSTC 感谢您的回复,还有个问题。 1.在训练时候,比如连续输入7帧,其对应的y标签是,中间那一帧的特征么?如果是中间那一帧,那么为什么在代码中是 n_concat - 1而不是加1呢? Cut target spectrogram and take the center frame of each 3D segment. speech_x_3d = mat_2d_to_3d(speech_x, agg_num=n_concat, hop=n_hop) y = speech_x_3d[:, (n_concat - 1) / 2, :] y_all.append(y)

2.预处理时候为什么要先把音频文件处理成csv格式的文件,而不是直接从音频文件提取特征呢?

ChangThinkTech avatar Feb 22 '19 06:02 ChangThinkTech

csv文件存放的是标签,不是音频文件

Best wishes,

Qiuqiang


From: only-yipie [email protected] Sent: 22 February 2019 06:27 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复,还有一个问题。预处理时候为什么要先把音频文件处理成csv格式的文件,而不是直接从音频文件提取特征呢?

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466288663, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yZN5uR0_GHVDQjOwzhHbUW8pKi0lks5vP43WgaJpZM4bHIwA.

qiuqiangkong avatar Feb 22 '19 07:02 qiuqiangkong

@qiuqiangkong 感谢您的回复,不好意思,我遇到了一个问题 用stoi方法评估语音质量时,干净语言和增强语言的长度不一样,这个时候应该怎么处理呢? 感谢!

ChangThinkTech avatar Feb 24 '19 09:02 ChangThinkTech

长度不一样很有可能是切帧的时候丢弃了一些采样点。只需要在增强的语音补0使得与clean音频一样长度即可。

Best wishes,

Qiuqiang


From: only-yipie [email protected] Sent: 24 February 2019 09:30 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复,不好意思,我遇到了一个问题 用stoi方法评估语音质量时,干净语言和增强语言的长度不一样,这个时候应该怎么处理呢? 感谢!

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466754309, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ySID-TF5gsz5i6qHNZnYJyWNvZbQks5vQlufgaJpZM4bHIwA.

qiuqiangkong avatar Feb 24 '19 10:02 qiuqiangkong

Hi,

长度不一样,应该是有可能的,但不会差太多,顶多是结尾的地方少一些samples。评估STOI的时候,你只要 取一个最短的 x(:min_len), y(:min_len),应该就可以

徐勇

On Sun, 24 Feb 2019 at 02:48, qiuqiangkong [email protected] wrote:

长度不一样很有可能是切帧的时候丢弃了一些采样点。只需要在增强的语音补0使得与clean音频一样长度即可。

Best wishes,

Qiuqiang


From: only-yipie [email protected] Sent: 24 February 2019 09:30 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复,不好意思,我遇到了一个问题 用stoi方法评估语音质量时,干净语言和增强语言的长度不一样,这个时候应该怎么处理呢? 感谢!

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466754309>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AMt5ySID-TF5gsz5i6qHNZnYJyWNvZbQks5vQlufgaJpZM4bHIwA>.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongxuUSTC/sednn/issues/26#issuecomment-466762700, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJj0uehxKn3MTpaQsBD04e76SsIvr3Kks5vQm31gaJpZM4bHIwA .

yongxuUSTC avatar Feb 25 '19 05:02 yongxuUSTC

@qiuqiangkong @yongxuUSTC 感谢您的回复 我还遇到了一个问题,在使用自己的数据集训练时候,因为noise和speech的长度不一样,不能加在一块。 但是prepare_data的calculate_mixture_features()方法,里明明存在处理长度不一样的代码。为什么还会遇到这个问题呢?

ChangThinkTech avatar Feb 27 '19 08:02 ChangThinkTech

@qiuqiangkong 您好,麻烦在请问一下,data_generator.py这个文件的作用是?我看了几遍代码并不理解这段代码的作用 while True: if (self.type == 'test') and (self.te_max_iter is not None): if iter == self.te_max_iter: break iter += 1 if pointer >= n_samples: epoch += 1 if (self.type) == 'test' and (epoch == 1): break pointer = 0 np.random.shuffle(index)
max_iter和pointer的意思是?为什么设置max——iter为100呢?判断self.type = test 的作用是? 期待您的回复,不胜感激!

ChangThinkTech avatar Mar 08 '19 09:03 ChangThinkTech

Hi,

data_generator.py is used for generating mini_batch data for neural network to train.

If there is a large amount of validation data, then validate on all of these data will be slow. So max_iter is used for only validate max_iter mini batches which will be faster.

Best wishes,

Qiuqiang


From: only-yipie [email protected] Sent: 08 March 2019 09:30 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 您好,麻烦在请问一下,data_generator.py这个文件的作用是?我看了几遍代码并不理解这段代码的作用 while True: if (self.type == 'test') and (self.te_max_iter is not None): if iter == self.te_max_iter: break iter += 1 if pointer >= n_samples: epoch += 1 if (self.type) == 'test' and (epoch == 1): break pointer = 0 np.random.shuffle(index) max_iter和pointer的意思是?为什么设置max――iter为100呢? 期待您的回复,不胜感激!

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-470863992, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yfWuxBZDezuZrrlkIw5P8fpuYfGjks5vUi27gaJpZM4bHIwA.

qiuqiangkong avatar Mar 08 '19 09:03 qiuqiangkong

@qiuqiangkong 感谢您的回复,其实我问您刚才那个问题的原因是,我在使用自己的数据集实验的时候,发现了 tr_x.shape[0] < te_x.shape[0]这种情况,怀疑可能是generator的原因,现在看来不是因为generator,您大概能知道这可能是什么原因么?不胜感激!

ChangThinkTech avatar Mar 08 '19 10:03 ChangThinkTech

您好,我们训练时如果是用全部TIMIT数据,tr_x.shape[0]大概是4000多,te_x.shape[0]大概是1000多。原因可能是未是用全部数据训练。

Best wishes,

Qiuqiang


From: only-yipie [email protected] Sent: 08 March 2019 10:19 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 感谢您的回复,其实我问您刚才那个问题的原因是,我在使用自己的数据集实验的时候,发现了 tr_x.shape[0] < te_x.shape[0]这种情况,怀疑可能是generator的原因,现在看来不是因为generator,您大概能知道这可能是什么原因么?不胜感激!

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-470878590, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yexe4-NaA1F4CN8rzLc83FCrVUn8ks5vUjkrgaJpZM4bHIwA.

qiuqiangkong avatar Mar 09 '19 20:03 qiuqiangkong

@qiuqiangkong 您好,1.那tr_x.shape[0]和te_x.shape[0]代表的意义是指总共有多少条语音呢?还是指输入的语音总共有多少帧特征?如果是指输入语音的条数,那么为什么使用mini_data的时候,tr_x.shape[0]和te_x.shape[0]分别是1392和566呢? 2.关于原因可能是未使用全部训练数据。这个调整我是应该在哪里调呢?同样的一份代码,我使用mini_data时候tr_x.shape[0]是大于te_x.shape[0]的,而使用自己数据的时候,是小于的(训练集的语音数量是比测试集语音数量要多的) 期待您的回复,不胜感激!

ChangThinkTech avatar Mar 10 '19 01:03 ChangThinkTech

您好,您的tr_x.shape和te_x.shape打印出来分别是多少?

Best wishes,

Qiuqiang


From: only-yipie [email protected] Sent: 10 March 2019 01:14 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 您好,1.那tr_x.shape[0]和te_x.shape[0]代表的意义是指总共有多少条语音呢?还是指输入的语音总共有多少帧特征?如果是指输入语音的条数,那么为什么使用mini_data的时候,tr_x.shape[0]和te_x.shape[0]分别是1392和566呢? 2.关于原因可能是未使用全部训练数据。这个调整我是应该在哪里调呢?同样的一份代码,我使用mini_data时候tr_x.shape[0]是大于te_x.shape[0]的,而使用自己数据的时候,是小于的(训练集的语音数量是比测试集语音数量要多的) 期待您的回复,不胜感激!

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-471237213, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yU_27uKcwrLiKnriWd8XJriL9iiLks5vVFx0gaJpZM4bHIwA.

qiuqiangkong avatar Mar 10 '19 23:03 qiuqiangkong

@qiuqiangkong 您好,tr_x.shape是(295250,7,257)te_x.shape是(336405,7,257,),在mixture_csv的文件里,train的记录是6001条,test的记录是1651条。 期待您的回复,不胜感激!

ChangThinkTech avatar Mar 11 '19 00:03 ChangThinkTech

您好,在Mini data中,tr_x.shape[0] = 1392 指的是在mini data(可能只有两个音频)一共可以切出1392个大小为(7, 257)的训练样本。如果有6001个音频,那么可以切出295250个大小为(7, 257)的训练样本。

Best wishes,

Qiuqiang


From: only-yipie [email protected] Sent: 11 March 2019 00:43 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26)

@qiuqiangkonghttps://github.com/qiuqiangkong 您好,tr_x.shape是(295250,7,257)te_x.shape是(336405,7,257,),在mixture_csv的文件里,train的记录是6001条,test的记录是1651条。 期待您的回复,不胜感激!

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/26#issuecomment-471371308, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yW9OLmLDmQLwrxAaIrx5yRUEOBQeks5vVaaqgaJpZM4bHIwA.

qiuqiangkong avatar Mar 13 '19 23:03 qiuqiangkong

您好,在Mini data中,tr_x.shape[0] = 1392 指的是在mini data(可能只有两个音频)一共可以切出1392个大小为(7, 257)的训练样本。如果有6001个音频,那么可以切出295250个大小为(7, 257)的训练样本。 Best wishes, Qiuqiang ________________________________ From: only-yipie [email protected] Sent: 11 March 2019 00:43 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] 关于数据预处理 (#26) @qiuqiangkonghttps://github.com/qiuqiangkong 您好,tr_x.shape是(295250,7,257)te_x.shape是(336405,7,257,),在mixture_csv的文件里,train的记录是6001条,test的记录是1651条。 期待您的回复,不胜感激! ― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#26 (comment)>, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yW9OLmLDmQLwrxAaIrx5yRUEOBQeks5vVaaqgaJpZM4bHIwA.

好的,我再仔细看看,感谢您的回复

ChangThinkTech avatar Mar 14 '19 06:03 ChangThinkTech