piper
piper copied to clipboard
Important note for dataset generation!
Hi, "python -m piper_train.preprocess" is not enable to treat double quote ("), maybe single quote too. The preprocess is ended with Ok, but in the stage of training it's stopped by "RuntimeError: CUDA out of memory. ..." error. Be careful! I lost a hug time to discover and fix this problem.
Hi @ican24, You can down the batch size, according to your dataset size.
Yes, surely, but it significantly slows the machine learning: 4-8 times. More easy to remove quotes.
Could you please elaborate on the problem, our dataset has many quote "
and I'm not sure what is the problem with it?
I am trying to fix the problem with commands
sed -i 's/"//g' metadata.csv
sed -i 's/”//g' metadata.csv
sed -i 's/“//g' metadata.csv
The quotes are meaningless in TTS. Maybe you need to add other commands too in your case. Those fixed my problem and I went ahead.
I mean why do quotes cause Cuda OOM, is there parsing problem with training code but not preprocessing?...
It is hard to say. I am not Deep Learning programmer. It needs to carefully analyze the existing code.
If your meta csv file has a line with only one quote, the csv reader will continue to read the next lines until it find the next quote, so several lines will be merged into one, creating a huge line and cause memory overflow. This should be fixed be setting quotechar
to None in this line: https://github.com/rhasspy/piper/blob/master/src/python/piper_train/preprocess.py#L421 i.e reader = csv.reader(csv_file, delimiter="|", quotechar=None)