nnsvs icon indicating copy to clipboard operation
nnsvs copied to clipboard

Discussion: NNSVS vs. NEUTRINO

Open r9y9 opened this issue 3 years ago • 8 comments

Samples: https://soundcloud.com/r9y9/sets/nnsvs-and-neutrino-comparison

While I was looking into the differences from nnsvs and neutrino samples, I noticed that there are MUCH room for improvement in the acoustic model. I will put some analysis results for the record.

Global variance

download

Spectrogram

Upper: nnsvs, lower: neutrino

download-1

Looks like neutrino put emphasis on <8000 Hz frequency bands

Aperiodicity

Upper: nnsvs, lower: neutrino

download-2

It seems that neutrino performs phrase-level synthesis (separated by rests I guess?). Aperiodicity components are filled with constant values for pause.

F0

download-3

MGC

download-4

  • mgc 0th: ours are shifted. This is not important cause gain of signals are different at training.
  • mgc higher dims: Clearly ours are smoothed. Temporal fluctuations are clearly observed for neutrino, but not for nnsvs.

BAP

download-5

  • Same as mgc, ours are over-smoothed

So what can we do?

So far I am thinking of the following ideas

  • Try autoregressive models to alleviate over-smoothing issues for mgc/bap modeling #15
  • Design a post-filter to alleviate the over-smoothing issues. I guess modulation spectrum based post-filter would work to some extent.

r9y9 avatar Apr 30 '22 04:04 r9y9

I wonder why higher mgc(s) and bap generated by GAN-based model are over-smoothed. Is there any possibility that MLPG contributes this over-smoothing?

I think phrase-level synthesis of neutrino may be to avoid the shortage of GPU memory.

taroushirani avatar Apr 30 '22 12:04 taroushirani

I suspect MLPG causes over-smoothing. I tried disabling MLPG but it actually did cause quality degradation. In particular, generated F0 became too flat. Maybe it would be worth trying to disable MLPG for spectral features (mgc and bap) and enable it for F0. Also, note that the GAN-based model is still in an experimental stage. I am still struggling to make it work good.

Yes, phrase-level synthesis could be useful to avoid GPU out-of-memory error when using NSF. It would also be useful if we use modulation spectrum based post-filter (search segment-level post-filter in https://ahcweb01.naist.jp/papers/journal/2016/201604_TASLP_Takamichi_1/201604_TASLP_Takamichi_1.paper.pdf)

r9y9 avatar Apr 30 '22 13:04 r9y9

Thank you for your rapid resnponse. I'm sorry but I misunderstood that the acoustic model of NNSVS was GAN-based because the graph legends of MGC and BAP(I re-checked the descriptions of samples at soundcloud).

And thank you for the information about modulation spectrum based post-filter. I'll read the paper.

taroushirani avatar Apr 30 '22 14:04 taroushirani

Sorry that's my bad. I didn't include any detailed information in the description. Some notes:

  • baseline: a baseline ResSkipF0FFConvLSTM model
  • gan: my attempt to integrate GAN for training ResSkipF0FFConvLSTM model (not very good at the moment)
  • neutrino neutrino.

For spectrogram/aperiodicity/F0, I used the baseline model. For mgc/bap, I used both the baseline and gan for comparison.

r9y9 avatar Apr 30 '22 15:04 r9y9

A good news: I've done an initial cut for MS post-filter and here is the spectrogram example:

From top to bottom: gan, gan with MS post-filter, neutrino download

Findings so far:

  • I got very similar patterns with neutrino by the MS-based post-filter. It's likely that neutrino also uses a similar (or same) post-filtering technique.
  • Over-smoothing can be alleviated by the MS-based post-filter.

An illustration for 50-dim mgc with and without post-filter:

download-1

r9y9 avatar Apr 30 '22 15:04 r9y9

ダウンロード (7)

Top: NNSVS (w/ GAN-based post-filter) Bottom: Neutrino

My bad; previous spectrogram visualization was wrong. I was assuming that neutrino uses the same mgc as ours, but it turned out they use a slightly differnet approach. Specifically,

  • Neutrino: pyworld.code_spectral_envelope (or C++ version of its impl) to convert spectral envelope to mgc
  • nnsvs: pysptk.sp2mc to convert spectral envelope to mgc

I suppose there's no big difference, but we may want to try the same approach as Neutrino to see if it actually makes difference.

r9y9 avatar May 22 '22 04:05 r9y9

https://github.com/nnsvs/nnsvs/issues/1#issuecomment-1332554913

r9y9 avatar Nov 30 '22 18:11 r9y9

I'll report a more detailed comparison by Jan 2023. I'll have a long vacation for a while.

r9y9 avatar Dec 01 '22 02:12 r9y9