Rafael Valle
Rafael Valle
It should not be an issue given that in Hindi there's a one to one correspondence between graphemes and phonemes.
@akshay4malik were you able to train the model with 2 steps of flow by warm-starting from the model with 1 step of flow you trained on your data?
Yes, it is possible as long as it is trained on the same mel-configuration as Flowtron.
please make sure you set the attention prior to True here https://github.com/NVIDIA/flowtron/blob/master/config.json#L34
Something along these lines: ``` FROM pytorch/pytorch:nightly-devel-cuda10.0-cudnn7 ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH} RUN apt-get update -y RUN apt-get install -y ffmpeg libsndfile1 sox locales vim RUN pip install --upgrade pip RUN pip...
Try collecting multiple z values (prior evidence), padding them to max length by replicating them and finally computing the mean. Intuitively, this procedure averages out sentence dependent characteristics (text, pitch...
You can also sample a distribution by collecting one z and treating each dimension as a mean. In this case you can either average over time or pad to desired...
@karkirowle do you have samples in which you average over both batch and time and then sample from your 80-d Gaussian n-frames times ?
For the people interested in style transfer: give us a few days to put a notebook up replicating some of our experiments.
Please take a look at https://github.com/NVIDIA/flowtron/blob/master/inference_style_transfer.ipynb