FastDiff icon indicating copy to clipboard operation
FastDiff copied to clipboard

Question about noise scheduling process.

Open LEECHOONGHO opened this issue 2 years ago • 1 comments

Hello I'm trying to implement noise scheduling process refer to BDDM's implementation BDDM/sampler.py

And I have some question for noise scheduling process for FastDiff-TTS.

  1. In the Fastdiff paper, the alphaN, betaN is set as hyperparameter like αˆt = 0.54, βˆt = 0.70. Can I use this hyper parameter for my own Fastdiff-TTS module or another number of reverse steps(ex) 6, 8, 10...)? How does it Calculated?

  2. For BDDM, searching alphaN, betaN requires some greedy searching with search_bin=9, and further searching step=10 for adding noise for params. ex) _alpha_param = alpha_param * (0.95 + np.random.rand() * 0.1) Dose Fastdiff requires similar process like above?

  3. For BDDM, STOI and PESQ is estimated for generated audio to find best noise schedule. How could we select best parameters based on two indicators STOI and PESQ?

  4. Are STOI and PESQ also needed for parameter searching process for Fastdiff?

  5. In BDDM, num_reverse_steps = math.floor( T / tau ). But in Fastdiff, T=1000, tau=200 and num_reverse_steps=4. Do I need to calculate num_reverse_steps by math.floor(T/tau) - 1? image

Thank you.

LEECHOONGHO avatar Jun 24 '22 02:06 LEECHOONGHO

Hi,

  1. It's OK to use another number of reverse steps, and just set the maximum number of sampling steps in scheduling ("N") in BDDM.
  2. the noise predictor of FastDiff shares a similar mechanism as BDDM's, and thus the calculation of STOI and PESQ is required.
  3. Thanks, this $\tau$ is a typo, and the algorithm still remains math.floor(T/tau). You could try it yourself: the higher $\tau$ is, the shorter the predicted inference schedule tends to be.

Rongjiehuang avatar Jun 28 '22 16:06 Rongjiehuang