Mayank Kumar Singh comments

Results 27 comments of


                                            Mayank Kumar Singh

Why is ASR model goes to train mode in the training loop

I have created a pull request addressing the issues with the ASR model. Putting the JDC network under eval mode is not so trivial and requires setting each individual to...

Why is ASR model goes to train mode in the training loop

Setting the dropouts to 0 does not have audible changes when working with speech signals (Except for the changes in the loss values) but does indeed have improvements when working...

Eval mode for ASR model and compatibility with PyTorch > 1.7

This is linked to the issue https://github.com/yl4579/StarGANv2-VC/issues/72

Reproducing results in the paper

I am also facing similar issues for both SR (256 x 256) as well as 64 x 64 resolution videos on both AIST++ and landscape datasets.

Reproducing results in the paper

@ltzheng In the evaluation code, the FAD is being multiplied by 1e3 instead of 1e4 which is mentioned in the paper. Correcting the scaling factor brings your FAD to 13.61...

Reproducing results in the paper

Hey @kaiw7 , I think the audio rate during evaluation is set to 44.1kHz because the audioclip model is trained on 44.1kHz data. In my opinion a fairer method to...

How many iterations have the pretrained models been trained for?

` Could I know where you found the 1e4 or 1e3?` https://github.com/researchmm/MM-Diffusion/blob/1d2d5ad9b47f57e7d300e087af8eb93181da094d/mm_diffusion/evaluator.py#L170C5-L170C43 `And how many samples did you use for the FVD evaluation?` 2048 (I used the default settings of...