IntelNeuromorphicDNSChallenge Dataset synthesis step failing

Running the dataset synthesis step

python microsoft_dns/noisyspeech_synthesizer.cfg -root ./

... result in this error

  File "microsoft_dns/noisyspeech_synthesizer.cfg", line 35
    audioformat: *.wav

Apr 30 '23 19:04 A-Telfer

could you solve the issue?

May 01 '23 00:05 kazi-m22

I think it was a typo in the docs and we were supposed to run the similarly named .py file, but there are some other errors with that. Still debugging

May 01 '23 01:05 A-Telfer

I've had a few issues getting the dataset built as well. This is what I have had to do so far:

I modified the origin points in the noisyspeech_synthesizer.py file these lines since the default download script extracts the raw files to /microsoft_dns/datasets_fullband/datasets_fullband/.
I have multiple versions of python on my environment, so for me the the command to be run is python noisyspeech_synthesizer.py -root ./
However, when I first ran that command, I got many versioning errors. After a little digging, it seems that the latest version of librosa is not compatible with the latest version of numpy. I had to downgrade my numpy version from 1.24.3 to 1.23.5, and downgrade my librosa version from 0.10.0 to 0.8.1.
After that, running the command in step 2 generates the training and validation files (I think). This is currently in progress for me, but I'll followup if this completes without error.

May 03 '23 22:05 BujSet

Exactly the same here, except rather than rename the path I just moved the raw files up a level

I started running 4 on my laptop and gave up waiting after 60,000+ or so since there was no progress bar (code uses while loops so not immediately clear how long it would take) and started thinking it might not work on a partial dataset download.

May 04 '23 20:05 A-Telfer

As an update, the synthesizer takes quite a while depending on your machine. I'm using a AMD Ryzen Threadripper processor with the raw files loaded onto a native SSD, it took about ~115 hrs just to generate the training_set. Seems like the validation set will take roughly the same.

@A-Telfer I'mnot sure what you mean by "no progress bar," as I periodically saw output from the synthesizer script indicating when it had to retry synthesizing some of the audio files:

Number of files to be synthesized: 60000
Start idx: 0
Stop idx: 59999
Generating synthesized data in ./
Warning: File #5 has unexpected clipping, returning without writing audio to disk
Warning: File #29 has unexpected clipping, returning without writing audio to disk
...
Warning: File #1114 has unexpected clipping, returning without writing audio to disk
Warning: File #1130 has unexpected clipping, returning without writing audio to disk
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Warning: File #1151 has unexpected clipping, returning without writing audio to disk
Warning: File #1164 has unexpected clipping, returning without writing audio to disk
...
Warning: File #29071 has unexpected clipping, returning without writing audio to disk
Warning: File #29090 has unexpected clipping, returning without writing audio to disk
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Warning: File #29107 has unexpected clipping, returning without writing audio to disk
Warning: File #29118 has unexpected clipping, returning without writing audio to disk
...
Warning: File #34891 has unexpected clipping, returning without writing audio to disk
Warning: File #34891 has unexpected clipping, returning without writing audio to disk
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Warning: File #34919 has unexpected clipping, returning without writing audio to disk
Warning: File #34925 has unexpected clipping, returning without writing audio to disk
...
Warning: File #44648 has unexpected clipping, returning without writing audio to disk
Warning: File #44661 has unexpected clipping, returning without writing audio to disk
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Found exception
Input signal length=0 is too small to resample from 48000->16000
Trying again
Warning: File #44699 has unexpected clipping, returning without writing audio to disk
Warning: File #44711 has unexpected clipping, returning without writing audio to disk
...
Warning: File #59935 has unexpected clipping, returning without writing audio to disk
Warning: File #59962 has unexpected clipping, returning without writing audio to disk
Warning: File #59989 has unexpected clipping, returning without writing audio to disk
Warning: File #59991 has unexpected clipping, returning without writing audio to disk
Warning: File #59997 has unexpected clipping, returning without writing audio to disk
Of the 466391 clean speech files analyzed, 2.6% had clipping, and 46.8% had low activity (below 60.0% active percentage)
Of the 221062 noise files analyzed, 18.4% had clipping, and 0.0% had low activity (below 0.0% active percentage)

I believe that it should work on a partial dataset, based on the information given during the orientation. Is this not the output you saw @A-Telfer?

For me, the training_set completed with the following properties: 180,003 items, totalling 172.8 GB. Not entirely sure if this is the expected output of the synth script.

May 09 '23 19:05 BujSet

Hi all, I have fixed the typo in the readme. As you already noted, it should have been noisyspeech_synthesizer.py, not .cfg

Assuming you have downloaded your dataset in ./data/datasets_fullband/, the commands to execute are

python noisyspeech_synthesizer.py -root ./data/datasets_fullband/
python noisyspeech_synthesizer.py -root ./data/datasets_fullband/ -is_validation_set true

The synthesis does take a lot of time, and there is no progress bar in the script. A way to monitor the progress is:

ls -l data/datasets_fullband/training_set/clean/*.wav | wc -l
ls -l data/datasets_fullband/validation_set/clean/*.wav | wc -l

These should print out the number of samples generated. It will give you the number of samples generated in training and validation set respectively. @BujSet 180,003 items looks correct. It's 60k audio samples for clean, noise, and noisy.

May 16 '23 22:05 bamsumit

I've had a few issues getting the dataset built as well. This is what I have had to do so far:

I modified the origin points in the noisyspeech_synthesizer.py file these lines since the default download script extracts the raw files to /microsoft_dns/datasets_fullband/datasets_fullband/.

I have multiple versions of python on my environment, so for me the the command to be run is python noisyspeech_synthesizer.py -root ./

However, when I first ran that command, I got many versioning errors. After a little digging, it seems that the latest version of librosa is not compatible with the latest version of numpy. I had to downgrade my numpy version from 1.24.3 to 1.23.5, and downgrade my librosa version from 0.10.0 to 0.8.1.

After that, running the command in step 2 generates the training and validation files (I think). This is currently in progress for me, but I'll followup if this completes without error.

I have librosa v0.10.0 and numpy v1.23.5 and it worked, but in microsoft_dns/noisyspeech_synthetizer_singleprocess.py line 90 I had to change librosa.resample(arg1, arg2, arg3) to librosa.resample(input_audio, orig_sr=fs_input, target_sr=fs_output).

Jun 06 '23 09:06 daevem

@daevem thanks for putting up this information. Perhaps the librosa interface has changed at some point. A working combination we have for the current version of code is with

librosa==0.9.2
numpy==1.23.3

Jun 06 '23 15:06 bamsumit