audio icon indicating copy to clipboard operation
audio copied to clipboard

Add support for bitrate in sox_io backend

Open rbracco opened this issue 4 years ago • 3 comments

soxi shows the files bitrate, but I'm pretty sure torchaudio.info doesn't when using the sox_io backend.

🚀 Feature

Create functionality within torchaudio.info and the AudioMetadata object to be able to determine a files bitrate.

Motivation

I want to be able to easily determine the bitrate within torchaudio so that I can do EDA and see if bitrate of audios has an impact on inference (e.g. if train set audios are all high bitrate but inference is low bitrate, will it lower accuracy?) If so then knowing the bitrates of the audios is important so that you can do proper augmentation to prevent the decreased accuracy.

Pitch

soxi shows bitrate, but as far as I can tell, torchaudio.info using the sox_io backend doesn't. Make torchaudio.info have the same data as soxi when using the sox_io_backend

Alternatives

I tried doing this outside of torchaudio with pydub, mutagen, file (command line) and soxi but it would be better if everything could be done with torchaudio. Thank you.

Additional context

rbracco avatar Aug 23 '21 21:08 rbracco

Hi @rbracco

Thanks for the suggestion. I looked into how soxi does this and learned that it simply divides the file size with the duration of audio.

static char const * size_and_bitrate(sox_format_t * ft, char const * * text)
{
  off_t size = lsx_filelength(ft);
  if (ft->signal.length && ft->signal.channels && ft->signal.rate && text) {
    double secs = ft->signal.length / ft->signal.channels / ft->signal.rate;
    *text = lsx_sigfigs3(8. * size / secs);
  }
  return lsx_sigfigs3((double)size);
}
  • https://github.com/dmkrepo/libsox/blob/b9dd1a86e71bbd62221904e3e59dfaa9e5e72046/src/sox.c#L296-L304
sox_uint64_t lsx_filelength(sox_format_t * ft)
{
  struct stat st;
  int ret = ft->fp ? fstat(fileno((FILE*)ft->fp), &st) : 0;

  return (!ret && (st.st_mode & S_IFREG))? (uint64_t)st.st_size : 0;
}
  • https://github.com/dmkrepo/libsox/blob/b9dd1a86e71bbd62221904e3e59dfaa9e5e72046/src/formats_i.c#L141-L147

I guess we can port this implementation when the input is a local file, but it will not be applicable to file-like object. Is that okay for you?

Looking further into the soxi's implementation, it does not report bit rate when file size can not be retrieved.

    if (ft->mode == 'r' && (text = size_and_bitrate(ft, &text2))) {
      fprintf(output, "File Size      : %s\n", text);
      if (text2)
        fprintf(output, "Bit Rate       : %s\n", text2);
    }
  • https://github.com/dmkrepo/libsox/blob/b9dd1a86e71bbd62221904e3e59dfaa9e5e72046/src/sox.c#L413-L417

I think the attribute to be added to AudioMetaData will be bitrate: Optional[float] and set to be None it cannot determine the bit rate. How does that sound?

mthrok avatar Sep 14 '21 02:09 mthrok

Running the following script, I got the different result from soxi and file command for vorbis format. @rbracco Do you know which one is correct?

#!/usr/bin/env bash

exts=(wav mp3 vorbis flac amb sph gsm htk)

for ext in "${exts[@]}"; do
    echo "***"
    out_file="foo.${ext}"
    sox --bits 16 --null "${out_file}" synth 1 sawtooth 1
    soxi "${out_file}"
    file "${out_file}"
done
  • soxi
Input File     : 'foo.vorbis'
Channels       : 1
Sample Rate    : 48000
Precision      : 16-bit
Duration       : 00:00:01.00 = 48000 samples ~ 75 CDDA sectors
File Size      : 5.77k
Bit Rate       : 46.2k
Sample Encoding: Vorbis
Comment        : 'Comment=Processed by SoX'
  • file
foo.vorbis: Ogg data, Vorbis audio, mono, 48000 Hz, ~80000 bps, created by: Xiph.Org libVorbis I

mthrok avatar Sep 14 '21 02:09 mthrok

Thank you so much for taking the time to look into this. Honestly I don't have a ton of knowledge of the details of bitrate and the fact that the soxi implementation was just dividing file size by duration was news to me.

I think the attribute to be added to AudioMetaData will be bitrate: Optional[float] and set to be None it cannot determine the bit rate. How does that sound?

That sounds like an excellent solution. It's not an issue for me if it doesn't support file-like object.

rbracco avatar Sep 14 '21 14:09 rbracco