Voice-based-gender-recognition
Voice-based-gender-recognition copied to clipboard
Issue With Score Computation
Hello,
In testing this implementation on some real-life recordings I took, I happened to get very negative scores for the file named "female2.wav". I'm wondering how this happened, and more specifically how the scoring algorithm works (the documentation for the score function appears to say that it is a log probability, but somehow we have positive values?). Any indication as to how this one .wav file could have generated negative scores while other similar ones generated positive ones would be greatly appreciated.

Hello,
Unlike probabilities the scores of the log-likelihood/ log-probabilities can be negative. To read more on it, you can refer to this link log-probability, but I can see already, how some scores are not abiding to the theory in the link.
For the scoring algorithm logic, it is based on the Reynolds-paper but you can also refer to this shorter reproduction/summary-paper. The papers are about speaker verification/recognition but in a similar fashion, you can drop the UBM-use and consider the same logic for gender recognition. I have aslo written a small blog on this that you can find here.
Concerning your recordings, they should have the same characteristics (sample rate, mono, stereo or poly, etc.) and that's why for example recordings with different microphones can be challenging in similar recognition problems. The database used in the project is normalized and all files have the same sample rate and are all mono. You can verify this using ffmpeg -i filename.wav
. This should result in something like ..., 16000 Hz, mono, s16, 256 kb/s
. In case, your recordings do not have similar characteristics like the ones in the SLR45, then use ffmpeg to convert them and adjust them.
Please let me know how this turns out ;)