Mayank Kumar Singh
Mayank Kumar Singh
I have created a pull request addressing the issues with the ASR model. Putting the JDC network under eval mode is not so trivial and requires setting each individual to...
Setting the dropouts to 0 does not have audible changes when working with speech signals (Except for the changes in the loss values) but does indeed have improvements when working...
This is linked to the issue https://github.com/yl4579/StarGANv2-VC/issues/72
I am also facing similar issues for both SR (256 x 256) as well as 64 x 64 resolution videos on both AIST++ and landscape datasets.
@ltzheng In the evaluation code, the FAD is being multiplied by 1e3 instead of 1e4 which is mentioned in the paper. Correcting the scaling factor brings your FAD to 13.61...
Hey @kaiw7 , I think the audio rate during evaluation is set to 44.1kHz because the audioclip model is trained on 44.1kHz data. In my opinion a fairer method to...
` Could I know where you found the 1e4 or 1e3?` https://github.com/researchmm/MM-Diffusion/blob/1d2d5ad9b47f57e7d300e087af8eb93181da094d/mm_diffusion/evaluator.py#L170C5-L170C43 `And how many samples did you use for the FVD evaluation?` 2048 (I used the default settings of...