piper icon indicating copy to clipboard operation
piper copied to clipboard

preprocess error

Open RoxBlox3 opened this issue 1 year ago • 8 comments

image (.venv) roxblox@PCRoxBlox:~/piper/src/python$ python3 -m piper_train.preprocess --language en --input-dir ~/piper/GLaDOS-Dataset --output-dir ~/piper/my-training --dataset-format ljspeech --single-speaker --sample-rate 22050 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/roxblox/piper/src/python/piper_train/preprocess.py", line 502, in main() File "/home/roxblox/piper/src/python/piper_train/preprocess.py", line 143, in main for utt in make_dataset(args): File "/home/roxblox/piper/src/python/piper_train/preprocess.py", line 423, in ljspeech_dataset assert len(row) >= 2, "Not enough columns" AssertionError: Not enough columns

I am on WSL from windows 10.

RoxBlox3 avatar Oct 19 '23 14:10 RoxBlox3

Hi @RoxBlox3, It seems that your transcription is not correct. Are you sure the transcription syntax is audio|text? (in case of LJSpeech)

rmcpantoja avatar Oct 20 '23 03:10 rmcpantoja

@rmcpantoja Thank you for the response. I think so ? here's an example of lines but perhaps it's things like ponctuation. I'm quite new to all this so idk.

a2_triple_laser01|Federal regulations require me to warn you that this next test chamber... is looking pretty good. a2_triple_laser02|That's right. The facility is completely operational again. a2_triple_laser03|I think these test chambers look even better than they did before. It was easy, really. You just have to look at things objectively, see what you don't need anymore, and trim out the fat. chellgladoswakeup01|Oh... It's you. chellgladoswakeup04|It's been a long time. How have you been? chellgladoswakeup05|I've been really busy being dead. You know, after you MURDERED ME. chellgladoswakeup06|Okay. Look. We both said a lot of things that you're going to regret. But I think we can put our differences behind us. For science. You monster. epilogue03|Oh thank god, you're alright.

RoxBlox3 avatar Oct 20 '23 09:10 RoxBlox3

I am having the same issue as well, did you ever find out what was wrong?

Ac3inSpac3 avatar Feb 11 '24 02:02 Ac3inSpac3

Hi @Ac3inSpac3, I had the same issue. the pipes werent registered as delimiter. A short python script can replace those characters. Try

import csv

def process_file_with_pipes_as_delimiter(input_filename):
    processed_data = []
    with open(input_filename, 'r', encoding='utf-8') as file:
        for line in file:
            #if line is not empty
            if line.strip() == '':
                continue
            fields = line.strip().split('|')
            processed_data.append(fields)
    return processed_data

def write_processed_data_to_csv(output_filename, data, delimiter='|'):
    with open(output_filename, 'w', encoding='utf-8', newline='') as csvfile:
        writer = csv.writer(csvfile, delimiter=delimiter, quotechar='"', quoting=csv.QUOTE_MINIMAL)
        for row in data:
            writer.writerow(row)

def main():
    input_filename = 'metadata.csv' #input file
    output_filename = 'output.csv' #output file
    processed_data = process_file_with_pipes_as_delimiter(input_filename)
    write_processed_data_to_csv(output_filename, processed_data, delimiter='|')

if __name__ == '__main__':
    main()

XKenixs avatar Feb 14 '24 17:02 XKenixs

image (.venv) roxblox@PCRoxBlox:~/piper/src/python$ python3 -m piper_train.preprocess --language en --input-dir ~/piper/GLaDOS-Dataset --output-dir ~/piper/my-training --dataset-format ljspeech --single-speaker --sample-rate 22050 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/roxblox/piper/src/python/piper_train/preprocess.py", line 502, in main() File "/home/roxblox/piper/src/python/piper_train/preprocess.py", line 143, in main for utt in make_dataset(args): File "/home/roxblox/piper/src/python/piper_train/preprocess.py", line 423, in ljspeech_dataset assert len(row) >= 2, "Not enough columns" AssertionError: Not enough columns

I am on WSL from windows 10.

did you find solution ?

SyedMuqtasidAli avatar Apr 06 '24 06:04 SyedMuqtasidAli

@RoxBlox3

SyedMuqtasidAli avatar Apr 06 '24 06:04 SyedMuqtasidAli

image (.venv) roxblox@PCRoxBlox:~/piper/src/python$ python3 -m piper_train.preprocess --language en --input-dir ~/piper/GLaDOS-Dataset --output-dir ~/piper/my-training --dataset-format ljspeech --single-speaker --sample-rate 22050 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/roxblox/piper/src/python/piper_train/preprocess.py", line 502, in main() File "/home/roxblox/piper/src/python/piper_train/preprocess.py", line 143, in main for utt in make_dataset(args): File "/home/roxblox/piper/src/python/piper_train/preprocess.py", line 423, in ljspeech_dataset assert len(row) >= 2, "Not enough columns" AssertionError: Not enough columns I am on WSL from windows 10.

did you find solution ?

No but it is probably an error with my transcription when i tried with another transcription file it worked in the end i did not finish making a new voice as someone made a much better one than what i could do so i did not continue.

RoxBlox3 avatar Apr 13 '24 14:04 RoxBlox3

I had the same issue. And i finally find out that's because i use pandas to delete some columns of the csv file, then i forgot to set the delimiter to '|', which became default comma

saikewei avatar Apr 24 '24 08:04 saikewei