opensmile-python icon indicating copy to clipboard operation
opensmile-python copied to clipboard

Scale of the loudness feature in the eGeMAPS set

Open YangLiyli131 opened this issue 1 year ago • 1 comments

Hello, I'm using this package to extract the loudness of audio files following the eGeMAPSv02 feature set ('loudness_sma3'). The values it returns me are very small values close to one. I'm curious what is the scale/unit of this feature and how to transform it to dB? Thank you.

YangLiyli131 avatar Mar 07 '23 13:03 YangLiyli131

Well, from my (admittedly limited understanding), according to the GeMAPS paper: https://ieeexplore.ieee.org/document/7160715

"Loudness is used here as a more perceptually relevant [62] alternative to the signal energy. In order to approximate humans’ non-linear perception of sound, an auditory spectrum as is applied in the Perceptual Linear Prediction (PLP) technique [63] is adopted. A non-linear Mel-band spectrum is constructed by applying 26 triangular filters distributed equidistant on the Mel-frequency scale from 20–8000 Hz to a power spectrum computed from a 25 ms frame. An auditory weighting with an equal loudness curve as used by [63] and originally adopted from [64] is performed. Next, a cubic root amplitude compression is performed for each band b of the equal loudness weighted Mel-band power spectrum [63]. resulting in a spectrum which is referred to as auditory spectrum. Loudness is then computed as the sum over all bands of the auditory spectrum."

PLP technique I believe refers to https://pubs.aip.org/asa/jasa/article/87/4/1738/930759/Perceptual-linear-predictive-PLP-analysis-of

dattilson avatar May 17 '23 14:05 dattilson