speechpy icon indicating copy to clipboard operation
speechpy copied to clipboard

Fixed some bugs in mel filterbanks.

Open yorange1 opened this issue 4 years ago • 4 comments

I wrote some code to compare the mel filterbanks in librosa, python speech feature and speechpy, and found two problems.

    1. The initialization of the band edge of the Mel filterbanks may be wrong.
    1. The calculation to convert frequency to fft bin number is wrong.
import matplotlib.pyplot as plt
import numpy as np
import librosa
import python_speech_features as psf
import speechpy

n_fft = 256        # The number of FFT components
n_filter = 20      # The number of filters in the filterbank
samplerate = 16000 # The samplerate of the signal
low_freq = 0       # The lowest band edge of the filters
high_freq = 8000   # The highest band edge of the filters

librosa_fbanks = librosa.filters.mel(
    sr=samplerate, n_fft=n_fft, n_mels=n_filter, fmin=low_freq, fmax=high_freq, norm=None)
print("Librosa mel fbanks shape:{}".format(librosa_fbanks.shape))

psf_fbanks = psf.base.get_filterbanks(
    nfilt=n_filter, nfft=n_fft, samplerate=samplerate, lowfreq=low_freq, highfreq=high_freq)
print("PSF mel fbanks shape:{}".format(psf_fbanks.shape))

coefficients = int(n_fft/2 + 1)
speechpy_fbanks = speechpy.feature.filterbanks(
    n_filter, coefficients, sampling_freq=samplerate, low_freq=low_freq, high_freq=high_freq)
print("Speechpy mel fbanks shape:{}".format(speechpy_fbanks.shape))

fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 10))

x = np.array(list(range(speechpy_fbanks.shape[1])))
x = x * (samplerate / (n_fft + 1))

for i in range(librosa_fbanks.shape[0]):
    axes[0].plot(x, librosa_fbanks[i])
axes[0].set_title("librosa mel fbanks")

for i in range(psf_fbanks.shape[0]):
    axes[1].plot(x, psf_fbanks[i])
axes[1].set_title("psf mel fbanks")

for i in range(speechpy_fbanks.shape[0]):
    axes[2].plot(x, speechpy_fbanks[i])
axes[2].set_title("speechpy mel fbanks")

plt.show()

image

As shown in the figure, the parameter setting of low_freq of filterbanks of speechpy is invalid, and the filterbanks only covers half of the frequency band.

The first problem is caused by

low_freq = low_freq or 300.

When low_freq is 0, low_freq or 300 will return 300 instead of 0.

The second problem is a calculation error.

freq_index = (
    np.floor(
        (coefficients +
         1) *
        hertz /
        sampling_freq)).astype(int)

coefficients is equal to fftpoints/2 +1, which cannot cover the complete frequency band. We should use fftpoints instead of coefficients for calculation.

As shown in my code,I have fixed the above two bugs and hope to get your review and merge. Thank you!

yorange1 avatar Nov 05 '21 05:11 yorange1

@arfon

???

arfon avatar Nov 05 '21 10:11 arfon

@arfon

???

sorry, I made a mistake.

yorange1 avatar Nov 05 '21 10:11 yorange1

Hope to get your review, thank you very much!@astorfi

yorange1 avatar Nov 12 '21 16:11 yorange1

@yorange1 I came across your fix b/c I just noticed the same issue myself. Been trying to get ahold of @astorfi via his emails that I can find, and on here. No response.

I'm thinking of starting a PEP 541 to take over the speechpy pip package. It lets you take over a package if the admin is MIA.

Would you be interested in being co-admin with me?

Alex-EEE avatar Jan 21 '23 00:01 Alex-EEE