essentia icon indicating copy to clipboard operation
essentia copied to clipboard

Speech analysis features wishlist

Open xaviliz opened this issue 4 years ago • 2 comments

Hi,

I detected some features and algorithms used in Speech Processing unavailable in Essentia which might be interesting to implement. Here my proposals:

  1. Mel Spectrogram
  2. Delta MFCC
  3. Delta-delta MFCC
  4. Vocal Tract Filtering
  5. Phase Distortion
  6. Phase Distortion standard Deviation
  7. Harmonic Model Phase Distortion
  8. Pulse Model
  9. Pre-Emphasis Filter
  10. Direction Of Arrival: MUSIC, CSSM, TOPS, SRP-PHAT

Thanks in advance

xaviliz avatar Feb 05 '21 09:02 xaviliz

Also it would be nice to include:

  1. PLP
  2. I-Vectors
  3. X-Vectors

xaviliz avatar Mar 26 '21 09:03 xaviliz

I would also suggest to add an option/parameter in StartStopSilence algorithm to provide a list of start and stop frames. Right now this algorithm provides just a start frame and stop frame, which is fine for a song. however in speech you have differnt events and intermediate frames with silence. It would be nice if we can provide a list for startFrame and stopFrame to segment events easily.

xaviliz avatar Jul 20 '21 08:07 xaviliz