Benjamin van Niekerk
Benjamin van Niekerk
Hi @Georgehappy1. I'll add some instructions to the README soon. The format of the json file is: ``` [ [ in_path, offset, duration, out_path ], ... ] ``` The following...
No problem @Georgehappy1. Also, I forgot to mention that you'll have to add a new config file `VCTK.yaml` under `config/dataset`. The format is: ```yaml dataset: dataset: VCTK language: english path:...
@Georgehappy1, just checking if you ever managed to get the training on VCTK working?
@Georgehappy1, fantastic! Looking forward to hearing the results. If you'd like to contribute your model and dataset splits, I'd be very happy to take a look at a pull request.
Hi @liu-x-p, Sure. If you look at the usage in the readme it says: ``` python preprocess.py in_dir=/path/to/dataset dataset=[2019/english or 2019/surprise] ``` Note: `in_dir` must be the path to the...
No problem @liu-x-p. If you're still having issues I'd advise keeping the actual data in a separate folder to this repo. So this repo would be under `holiday/ZeroSpeech` for example...
Hi @sufeidechabei, sorry about the delay. Unfortunately, it's not real-time. On a GeForce RTX 2080 super I'm getting about 3000 samples / sec so it's roughly 5.3x slower than real-time....
Hi @sbkim052, Yeah, it should work with unseen speech as the input. All the examples [here](https://bshall.github.io/ZeroSpeech/) are converted from unseen speech. If you want to convert to an unseen speaker,...
@sbkim052, no problem. The basic idea is to train a speaker verification/classification model to learn an embedding space for speaker identity. Then, instead of conditioning the decoder on a fixed...
Hi @michael-conrad, Apart from the normalization steps, the parameters used to extract the mel-spectorgram need to be the same as the ones used in this repo. From a cursory glance...