lhotse icon indicating copy to clipboard operation
lhotse copied to clipboard

Issues about feature extraction and lilcom: last stride should be 1

Open JerryPeng21cuhk opened this issue 3 years ago • 2 comments

I tried to make a new feature extractor (features/melspectrogram.py) and experienced some issues solved. Just post them here in case it could be helpful to other new beginners.

Similar to features/fbank.py, I created a function: extract_melspectrogram with inputs of waveform (Tensor) and its conf kwargs It returns an array (numpy.array or torch.tensor) with the shape of (num_frames, feat_dim).

def extract_melspectrogram(waveform: Tensor, **kwargs) -> Tensor:
    feats = wav2feats(waveform. **kwargs)  # extraction process (to be ignored)
    # feats has the same of (feat_dim, num_frames)
    feats = feats.T  # (it will give rise to an assertion error when store feats using lilcom)
    return feats
  1. It will cause an assertion error in lilcom/compression.cc because of the array transpose operation which swaps the strides of feats.

  2. If the feats have a relatively small dynamic range, like normalized feats with the range of [0, 1]. The LilcomFilesWriter should be fed with a smaller tick_power due to its absolute compression error. I found the error will cause my feats reconstruction loss to fluctuate significantly.

JerryPeng21cuhk avatar Mar 04 '21 03:03 JerryPeng21cuhk

You can do feats = feats.transpose().copy() to avoid the 1st problem. (Prefer transpose() as it's more explicit).

If you are not storing the mel energies in log space, it likely won't interact that well with the compression method. I'd advise to store in log space.

danpovey avatar Mar 04 '21 05:03 danpovey

Thanks for letting us know! Actually I think I can add a check before lilcom compression so that when it detects only (or mostly) non-negative values/narrow value range, it will warn you that you might not be using it as intended.

pzelasko avatar Mar 04 '21 14:03 pzelasko