Noisemes_sad crashes due to MemoryError
Traceback (most recent call last):
File "SSSF/code/predict/1-confidence-vm5.py", line 60, in
I tested Noisemes_sad at 3 different VM memory settings. These are the smallest input lengths that I tested that it failed:
VM Memory (GB) | Success (min) | Fail (min) 2GB | 5 min | 6 min 3GB | 9 min | 10 min 4GB | 12 min | 13 min
This issue was created at the request of Prof. Metze to document the memory issue. Looking into the details of the line that fails, with the input at 10 minutes, the feature array is 320 MB, and the VM memory in the operation of that line spikes to around 3GB; the failure occurred for me at the "- mu" step.
edit: The data used came from the vandam3 folder here:
https://drive.google.com/drive/u/4/folders/1Meont2RU8DbZfwmlQExEN_SiQGWft_2i
The file is e20110915_110122_007574.mp3 . The data is the first n minutes trimmed from this daylong recording.
As we discussed, need to investigate how this can be avoided. Seems there must be a better way of performing mean subtraction.
I made python 3 versions of all the code that noisemes sad uses, and upgraded all of the outdated packages in the vm, and the problem seems to be significantly reduced. On my laptop, the existing python 2 noisemes can handle up to 9 minute files on the 3GB vm before crashing, whereas the python 3 version can handle up to around 25 minutes on the 3GB vm. Numpy had memory issues in the past, including not playing well with the garbage collector, but the issues were fixed in new versions on python 3.
However, the way that noisemes predicts the classes currently involves loading the entire set of features for the audio file into memory. Since the features from a daylong recording is around 18GB, running it on an average computer will still require either chunking of the original audio file, or splitting up the features during the prediction stage if the model allows that.
I modified the prediction stage of noisemes_sad to process the htk file in chunks and write out the rttm file iteratively. The script was then able to perform speech detection on audio files between 1 and 2 hours for my system, taking about six minutes. Unfortunately I ran into another memory issue, this time in SMILExtract:
~/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract
I wasn't able to find where the source code for this was. Currently in extract-htk.sh, SMILExtract creates a new .htk file for every .wav input. Unless it already has the ability to append to the file instead of rewriting it, the best it can currently do to get noisemes running on the daylong recordings is to generate multiple feature files. If that's the case, merging the pieces back together would either be done at prediction time, which means changing how the scripts handle multiple input data files, or after the predictions are finished, probably through some rttm_merge script.
That's great, sounds good. Opensmile (https://www.audeering.com/technology/opensmile/) can do incremental processing, although I am not sure if it can do it from files - but I assume it should be able to do it somehow. There should not be a fundamental issue with long files.
So who is going to integrate this into the merged version of the recognition script? Marvin?