MoonCast icon indicating copy to clipboard operation
MoonCast copied to clipboard

Question about the data preparation

Open xiami2019 opened this issue 8 months ago • 3 comments

Hi, great work!

I find that the Appendix B Data Preparation part is a little bit confused. It first says that using Dnsmos and forced alignment to filter out some data. But the last sentence says that you did not filter any segments based on these filters.

I'm wondering that whether you use dnsmos and forced alignment as filters.

Best regards.

xiami2019 avatar Apr 25 '25 04:04 xiami2019

@xiami2019 Thanks for the question. We use DNSMOS and force alignment to filter out file level data. In each file, we do not filter out any segments to keep the context information.

MSLDCherryPick avatar Apr 26 '25 01:04 MSLDCherryPick

@xiami2019 Thanks for the question. We use DNSMOS and force alignment to filter out file level data. In each file, we do not filter out any segments to keep the context information.

May I ask which specific model or toolkit was used for the forced alignment?

rulerman avatar May 03 '25 04:05 rulerman

@rulerman We build a Force alignment model using kaldi nnet3 scripte with around 10k hours data

MSLDCherryPick avatar May 26 '25 14:05 MSLDCherryPick