Dataset synthesis step failing
Running the dataset synthesis step
python microsoft_dns/noisyspeech_synthesizer.cfg -root ./
... result in this error
File "microsoft_dns/noisyspeech_synthesizer.cfg", line 35
audioformat: *.wav
could you solve the issue?
I think it was a typo in the docs and we were supposed to run the similarly named .py file, but there are some other errors with that. Still debugging
I've had a few issues getting the dataset built as well. This is what I have had to do so far:
- I modified the origin points in the
noisyspeech_synthesizer.pyfile these lines since the default download script extracts the raw files to/microsoft_dns/datasets_fullband/datasets_fullband/. - I have multiple versions of python on my environment, so for me the the command to be run is
python noisyspeech_synthesizer.py -root ./ - However, when I first ran that command, I got many versioning errors. After a little digging, it seems that the latest version of librosa is not compatible with the latest version of numpy. I had to downgrade my numpy version from
1.24.3to1.23.5, and downgrade my librosa version from0.10.0to0.8.1. - After that, running the command in step 2 generates the training and validation files (I think). This is currently in progress for me, but I'll followup if this completes without error.
Exactly the same here, except rather than rename the path I just moved the raw files up a level
I started running 4 on my laptop and gave up waiting after 60,000+ or so since there was no progress bar (code uses while loops so not immediately clear how long it would take) and started thinking it might not work on a partial dataset download.
As an update, the synthesizer takes quite a while depending on your machine. I'm using a AMD Ryzen Threadripper processor with the raw files loaded onto a native SSD, it took about ~115 hrs just to generate the training_set. Seems like the validation set will take roughly the same.
@A-Telfer I'mnot sure what you mean by "no progress bar," as I periodically saw output from the synthesizer script indicating when it had to retry synthesizing some of the audio files:
Number of files to be synthesized: 60000
Start idx: 0
Stop idx: 59999
Generating synthesized data in ./
Warning: File #5 has unexpected clipping, returning without writing audio to disk
Warning: File #29 has unexpected clipping, returning without writing audio to disk
...
Warning: File #1114 has unexpected clipping, returning without writing audio to disk
Warning: File #1130 has unexpected clipping, returning without writing audio to disk
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Warning: File #1151 has unexpected clipping, returning without writing audio to disk
Warning: File #1164 has unexpected clipping, returning without writing audio to disk
...
Warning: File #29071 has unexpected clipping, returning without writing audio to disk
Warning: File #29090 has unexpected clipping, returning without writing audio to disk
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Warning: File #29107 has unexpected clipping, returning without writing audio to disk
Warning: File #29118 has unexpected clipping, returning without writing audio to disk
...
Warning: File #34891 has unexpected clipping, returning without writing audio to disk
Warning: File #34891 has unexpected clipping, returning without writing audio to disk
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Warning: File #34919 has unexpected clipping, returning without writing audio to disk
Warning: File #34925 has unexpected clipping, returning without writing audio to disk
...
Warning: File #44648 has unexpected clipping, returning without writing audio to disk
Warning: File #44661 has unexpected clipping, returning without writing audio to disk
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Warning: File #44699 has unexpected clipping, returning without writing audio to disk
Warning: File #44711 has unexpected clipping, returning without writing audio to disk
...
Warning: File #59935 has unexpected clipping, returning without writing audio to disk
Warning: File #59962 has unexpected clipping, returning without writing audio to disk
Warning: File #59989 has unexpected clipping, returning without writing audio to disk
Warning: File #59991 has unexpected clipping, returning without writing audio to disk
Warning: File #59997 has unexpected clipping, returning without writing audio to disk
Of the 466391 clean speech files analyzed, 2.6% had clipping, and 46.8% had low activity (below 60.0% active percentage)
Of the 221062 noise files analyzed, 18.4% had clipping, and 0.0% had low activity (below 0.0% active percentage)
I believe that it should work on a partial dataset, based on the information given during the orientation. Is this not the output you saw @A-Telfer?
For me, the training_set completed with the following properties: 180,003 items, totalling 172.8 GB. Not entirely sure if this is the expected output of the synth script.
Hi all, I have fixed the typo in the readme. As you already noted, it should have been noisyspeech_synthesizer.py, not .cfg
Assuming you have downloaded your dataset in ./data/datasets_fullband/, the commands to execute are
python noisyspeech_synthesizer.py -root ./data/datasets_fullband/
python noisyspeech_synthesizer.py -root ./data/datasets_fullband/ -is_validation_set true
The synthesis does take a lot of time, and there is no progress bar in the script. A way to monitor the progress is:
ls -l data/datasets_fullband/training_set/clean/*.wav | wc -l
ls -l data/datasets_fullband/validation_set/clean/*.wav | wc -l
These should print out the number of samples generated. It will give you the number of samples generated in training and validation set respectively. @BujSet 180,003 items looks correct. It's 60k audio samples for clean, noise, and noisy.
I've had a few issues getting the dataset built as well. This is what I have had to do so far:
- I modified the origin points in the
noisyspeech_synthesizer.pyfile these lines since the default download script extracts the raw files to/microsoft_dns/datasets_fullband/datasets_fullband/.- I have multiple versions of python on my environment, so for me the the command to be run is
python noisyspeech_synthesizer.py -root ./- However, when I first ran that command, I got many versioning errors. After a little digging, it seems that the latest version of librosa is not compatible with the latest version of numpy. I had to downgrade my numpy version from
1.24.3to1.23.5, and downgrade my librosa version from0.10.0to0.8.1.- After that, running the command in step 2 generates the training and validation files (I think). This is currently in progress for me, but I'll followup if this completes without error.
I have librosa v0.10.0 and numpy v1.23.5 and it worked, but in microsoft_dns/noisyspeech_synthetizer_singleprocess.py line 90 I had to change librosa.resample(arg1, arg2, arg3) to librosa.resample(input_audio, orig_sr=fs_input, target_sr=fs_output).
@daevem thanks for putting up this information. Perhaps the librosa interface has changed at some point. A working combination we have for the current version of code is with
librosa==0.9.2
numpy==1.23.3