audtorch icon indicating copy to clipboard operation
audtorch copied to clipboard

Invertible transforms

Open ATriantafyllopoulos opened this issue 5 years ago • 4 comments

Feature

I would like to introduce invertible transforms. This means that every transform that can be inverted will have an extra function, e.g. named inverse that would undo the operation it did on the previous signal. An example for our Normalize transform would look like this:


class Normalize(object):
    def __init__(self, *, axis=-1):
        super().__init__()
        self.axis = axis
        self.peak = None

    def __call__(self, signal):
        if self.axis is not None:
            self.peak = np.expand_dims(np.amax(np.abs(signal), axis=self.axis),
                                       axis=self.axis)
        else:
            self.peak = np.amax(np.abs(signal))
        return signal / np.maximum(self.peak, 1e-7)

    def inverse(self, signal):
        assert self.peak is not None
        return signal * np.maximum(self.peak, 1e-7)

Motivation

For specific use-cases, it would be nice to make our transforms invertible. The ones I have in mind are those were some kind of reconstruction is required after an architecture has processed the signal (e.g. denoising or source separation in the spectrogram domain).

It might be a limited use-cases, but our API is currently not supporting it.

Problems

  • This would require us to deprecate or duplicate the code in functional, as most of the variables there are needed to keep the "state" of the transform. I am not sure we would like to do that. Perhaps we could change the nature of the functional's return to a state_dict that contains all necessary variables for inverting their operation (where possible).

  • This is of course useful only in the case where we have a batch size of 1. If we have a bigger batch size, then the parameters of the invertible transform will be different for every element in the batch, and I cannot see a way to access them properly. But in my opinion this is fine as what I have in mind is using the inverse property in some deployment environment.

  • The above is only relevant if we want to invert the transform outside of the data loader (a use case would be you load the data, compute the spectrograms, denoise them, and then reconstruct the denoised audio). If you want to invert the transfrom inside the data loader (e.g. compute the spectrogram, do some kind of spectrogram augmentation transform, go back to raw audio, feed that in your model) then this would still work.

Any opinions on this? Am I the only one who needs it?

ATriantafyllopoulos avatar Jul 02 '19 08:07 ATriantafyllopoulos

For deployment purposes it could be useful.

Perhaps we could change the nature of the functional's return to a state_dict

I didn't quite understand this proposed interface. Could you draft a function corpse, e.g. for Normalize?

phtephanx avatar Jul 03 '19 15:07 phtephanx

For Spectrogram would it look like this?

class Spectrogram(object):

    def __init__(self, window_size, hop_size, *, fft_size=None,
                 window='hann', axis=-1):
        super().__init__()
        self.window_size = window_size
        self.hop_size = hop_size
        self.fft_size = fft_size
        self.window = window
        self.axis = axis
        self.phase = []

    def __call__(self, signal):
        self.spectrogram = F.stft(signal, self.window_size, self.hop_size,
                                  fft_size=self.fft_size, window=self.window,
                                  axis=self.axis)
        magnitude, _ = librosa.magphase(self.spectrogram)
        return magnitude

    def inverse(self):
        return F.istft(self.spectrogram, self.window_size, self.hop_size,
                       window=self.window, axis=self.axis)

As this is only limited to the last processed signal, I also see no need for changing the functionals. Could you provide an example for that.

hagenw avatar Jul 24 '19 11:07 hagenw

This one was easy. In Standardize for example you would have to do:

class Standardize(object):
    def __init__(self, *, mean=True, std=True, axis=-1):
        super().__init__()
        self.axis = axis
        self.mean = mean
        self.std = std

    def __call__(self, signal):
        if self.mean:
            signal_mean = np.mean(signal, axis=self.axis)
            if self.axis is not None:
                self.signal_mean = np.expand_dims(signal_mean, axis=self.axis)
            signal = signal - self.signal_mean
        if self.std:
            self.signal_std = np.std(signal, axis=self.axis)
            if self.axis is not None:
                self.signal_std = np.expand_dims(
                    self.signal_std, axis=self.axis)
            signal = signal / np.maximum(self.signal_std, 1e-7)
        return signal

    def inverse(self, signal):
        if self.std:
            signal = signal * np.maximum(self.signal_std, 1e-7)
        if self.mean:
            signal = signal + self.signal_mean
        return signal

because you somehow need the internal variables mean and std.

ATriantafyllopoulos avatar Jul 24 '19 14:07 ATriantafyllopoulos

OK, I see in this case it would indeed be the easiest to remove F.standardize to solve the issue.

Could you maybe compile a list, which functionals would be affected by this? If it would affect only few of them we could think about removing those.

hagenw avatar Jul 24 '19 15:07 hagenw