audio torchaudio mobile?

🚀 Feature

torchaudio should work on Android and iOS platforms

Motivation

Mobile apps need a fast way to preprocess audio (i.e. generate spectrograms).

Pitch

I would love to see a lightweight interface that provides access to all torchaudio functions from Java and ObjectiveC. Heavy logic like FFT should be optimized for performance (i.e. pushed to GPU whenever available).

Alternatives

Tensorflow Lite is moving in this direction (mfcc and most other signal processing ops are already whitelisted).

Additional context

Jan 14 '20 19:01 dkashkin

Transforms and functionals are now jitable. Have you tried exporting your model using jit and then importing on mobile? see mobile page Is that what you are trying to do?

Jan 16 '20 16:01 vincentqb

Wow this sounds great! I have not noticed this in the docs and assumed that heavy stuff like FFT is not supported yet. I'll test it out over the next couple of days and share my results here.

Jan 16 '20 18:01 dkashkin

@vincentqb I just traced the simplest possible module (with just a single torchaudio.transforms.MelSpectrogram) and attempted to run it in the demo app (github.com/pytorch/android-demo-app with org.pytorch:pytorch_android:1.4.0.1). Unfortunately, the forward method crashes on Android:

fft: ATen not compiled with MKL support

File "code/__torch__/torch/nn/modules/module/___torch_mangle_5.py", line 37
        _16 = ops.prim.NumToTensor(torch.size(input1, 2))
        input2 = torch.view(input1, [int(_15), int(_16)])
        spec_f = torch.stft(input2, 1024, 143, 1024, window, False, True)
                 ~~~~~~~~~~ <--- HERE
        _17 = ops.prim.NumToTensor(torch.size(spec_f, 1))
        _18 = ops.prim.NumToTensor(torch.size(spec_f, 2))    
        at org.pytorch.NativePeer.forward(Native Method)
        at org.pytorch.Module.forward(Module.java:37)
        at org.pytorch.helloworld.MainActivity.onCreate(MainActivity.java:74)

Can you please confirm whether pytorch mobile runtime is indeed supposed to support the STFT transform?

Jan 17 '20 21:01 dkashkin

Indeed, fft is not currently supported on pytorch mobile, as mentioned here.

Jan 17 '20 21:01 vincentqb

Ouch :( Do you think this will be resolved soon? If not, I would highly recommend improving the PyTorch Mobile documentation to specify which transforms are supported versus not. Also, I hope this feature request will stay open until all torchaudio features start working on mobile.

Jan 17 '20 21:01 dkashkin

@dreiss @supriyar -- do you have some information to share about the functions that are supported within mobile, or the ones that would welcome contributions from the community?

Jan 21 '20 16:01 vincentqb

I'am working on an iOS project and hitting this error:

fft: ATen not compiled with MKL support

Although Apple has the vDSP framework could do this kind of processing, I would like to stick to the torch solution due to the simplicity and consistency.

I hope we could have fft support in mobile since it's a very basic building block in audio processing and would be needed in many speech related models.

Jan 29 '20 15:01 himajin2045

We enable every forward CPU op on mobile. I think the issue here is that this particular op doesn't have a portable implementation. We would be interested in a PR added one. Another option would be to do the fft before feeding the data to your model.

Jan 29 '20 18:01 dreiss

We enable every forward CPU op on mobile. I think the issue here is that this particular op doesn't have a portable implementation. We would be interested in a PR added one. Another option would be to do the fft before feeding the data to your model.

Is there a PR we could link to indicating which operations we would welcome contributions for?

Jan 29 '20 19:01 vincentqb

I don't have a list because we haven't tested out every niche op, but if there is any op that doesn't have a non-Intel implementation, I'd be open to seeing a PR.

Jan 29 '20 19:01 dreiss

Don't have the time to personally PR, but you can do a naive implementation of rfft with matmuls, e.g.

(This code is translated from Tensorflow Magenta at https://github.com/tensorflow/magenta/blob/cf80d19fc0c2e935821f284ebb64a8885f793717/magenta/music/melspec_input.py#L64-L90, I removed some padding code I didn't need)

def _dft_matrix(dft_length):
    # type: (int) -> Tuple[Tensor, Tensor]
    real = 2 * math.pi / float(dft_length)
    imag = 2 * math.pi / float(dft_length)

    sum_components = torch.ger(torch.arange(dft_length), torch.arange(dft_length))

    keep_values = dft_length // 2 + 1
    real_part = torch.cos(real * sum_components)[:keep_values, :].transpose(0, 1)
    imag_part = -torch.sin(imag * sum_components)[:keep_values, :].transpose(0, 1)

    return real_part, imag_part

def _naive_rdft(signal_tensor, fft_length):
    # type: (Tensor, int) -> Tensor
    """Implement real-input Fourier Transform by matmul."""
    # We are right-multiplying by the DFT matrix, and we are keeping
    # only the first half ("positive frequencies").
    # So discard the second half of rows, but transpose the array for
    # right-multiplication.
    # The DFT matrix is symmetric, so we could have done it more
    # directly, but this reflects our intention better.
    real_dft_tensor, imag_dft_tensor = _dft_matrix(fft_length)
    signal_frame_length = signal_tensor.shape[-1]
    result_real_part = torch.bmm(signal_tensor, real_dft_tensor.unsqueeze(0))
    result_imag_part = torch.bmm(signal_tensor, imag_dft_tensor.unsqueeze(0))

    return torch.stack([result_real_part, result_imag_part], dim=3)

# windowed_frames [1, T, 512]
out_check = _naive_rdft(windowed_frames, 512)
out = torch.rfft(windowed_frames, 1)
assert torch.allclose(out_check, out, atol=1e-5)

Jan 30 '20 00:01 PCerles

When I try to import torchaudio into my Android project's app build.gradle file like so: implementation 'org.pytorch:pytorch_android_torchaudio:1.5.0'

I get this error: Could not find org.pytorch:pytorch_android_torchaudio:1.5.0.

Is this the same issue as what's reported above?

Jul 28 '20 13:07 dstrube1

When I try to import torchaudio into my Android project's app build.gradle file like so: implementation 'org.pytorch:pytorch_android_torchaudio:1.5.0'

I get this error: Could not find org.pytorch:pytorch_android_torchaudio:1.5.0.

Is this the same issue as what's reported above?

@dstrube1

There is no android package dedicated for torchaudio. You build your model or pipeline in Python, then dump it as a Torchscript file, then load it from your app and run it with Torchscript runtime. Please refer to the following.

https://pytorch.org/mobile/home/ https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html

Jul 28 '20 13:07 mthrok

Hi @jeffxtang, it seems that we can run fft directly with pytorch from version 1.10, due to the PocketFFT support from this commit: https://github.com/pytorch/pytorch/commit/4036820506693b71a96b9e20989bfe286acccd89

Can you confirm? thanks

May 04 '22 14:05 algat

Hi @jeffxtang, it seems that we can run fft directly with pytorch from version 1.10, due to the PocketFFT support from this commit: pytorch/pytorch@4036820

Can you confirm? thanks

Thanks for your question. Yes it's confirmed. We're working on an update of the demos to simplify the iOS and Android code.

May 04 '22 16:05 jeffxtang

Hi, just following up if torchaudio is available on React Native now..? @mthrok @vincentqb @dkashkin Trying to do something like this:

audio, sample_rate = librosa.load(audio_filepath, sr=None)
# audio is a numpy array of floats 1xN, and sample_rate is a int

# Convert audio numpy array to spec image which is a numpy array or pytorch tensor 
clip = torch.Tensor(audio)
# clip is a pytroch tensor datatype, which has the same dimensions as audio, 1xN tensor

spec = torchaudio.transforms.MelSpectrogram(sample_rate=sampling_rate, n_fft=n_fft, win_length=window_length, hop_length=hop_length, n_mels=n_mels)(clip)
# spec is a spectrogram, which is an image, data type is a pytorch tensor, and the dimensions is 3x128x15000

All on a React Native app :)

Oct 30 '22 10:10 henryleemr

Hi, just following up if torchaudio is available on React Native now..? @mthrok @vincentqb @dkashkin Trying to do something like this:

audio, sample_rate = librosa.load(audio_filepath, sr=None)
# audio is a numpy array of floats 1xN, and sample_rate is a int

# Convert audio numpy array to spec image which is a numpy array or pytorch tensor 
clip = torch.Tensor(audio)
# clip is a pytroch tensor datatype, which has the same dimensions as audio, 1xN tensor

spec = torchaudio.transforms.MelSpectrogram(sample_rate=sampling_rate, n_fft=n_fft, win_length=window_length, hop_length=hop_length, n_mels=n_mels)(clip)
# spec is a spectrogram, which is an image, data type is a pytorch tensor, and the dimensions is 3x128x15000

All on a React Native app :)

@raedle may know if torchaudio is available on React Native.

Btw, both the iOS and Android streaming ASR demo apps using PyTorch 1.12 and torchaudio 0.12 were updated in July.

Oct 31 '22 04:10 jeffxtang

@jeffxtang it depends. Generally, if it is possible in native Android and iOS, then it is theoretically possible in React Native.

I did a quick check on the PyTorch Android/iOS demo apps but couldn't see any dependency to torchaudio in the app. How is the audio converted to tensors?

Oct 31 '22 04:10 raedle

Yeah, i couldn't find the part of the code where we take a stream of audio from react native, and convert it tensors, or the part that converts the raw audio streams into Mel Spectrograms, anybody got a clue?

Oct 31 '22 05:10 henryleemr

@henryleemr PlayTorch doesn't support audio streaming. Do you have an end-to-end example for either iOS or Android w/o React Native?

I might be open looking into this for PlayTorch, which would enable others using audio streaming in the future

Oct 31 '22 15:10 raedle

@jeffxtang it depends. Generally, if it is possible in native Android and iOS, then it is theoretically possible in React Native.

I did a quick check on the PyTorch Android/iOS demo apps but couldn't see any dependency to torchaudio in the app. How is the audio converted to tensors?

@raedle This script uses torchaudio to converts the audio and generates the model used in the iOS & Android apps. So internally the model uses torchaudio's transforms.MelSpectrogram to do the audio conversion. @henryleemr

Oct 31 '22 16:10 jeffxtang

@jeffxtang it depends. Generally, if it is possible in native Android and iOS, then it is theoretically possible in React Native. I did a quick check on the PyTorch Android/iOS demo apps but couldn't see any dependency to torchaudio in the app. How is the audio converted to tensors?

@raedle This script uses torchaudio to converts the audio and generates the model used in the iOS & Android apps. So internally the model uses torchaudio's transforms.MelSpectrogram to do the audio conversion. @henryleemr

Ah cool! So if I were to say, use ResNet for the spectrogram (Which is just an image), I can do something like this?

import torch
import torch.nn as nn
import torchaudio
import torchvision.models as models


class ResNet(nn.Module):
	def __init__(self, dataset, pretrained=True):
		super(ResNet, self).__init__()
		num_classes = 4

		self.transform = torchaudio.transforms.MelSpectrogram(sample_rate=16000, n_fft=400, n_mels=80, hop_length=160)
		self.model = models.resnet50(pretrained=pretrained)
		self.model.fc = nn.Linear(2048, num_classes)
		
    def forward(self, raw_audio_tensor):
        spectrogram = self.transform(raw_audio_tensor)
        output = self.model(spectrogram)
        return output

Correct me if I am wrong, once we've exported the model trained using this class into a .ptl model file, we can just use the usual react native methods to load the model and pass raw audio streams into this model and the Mel Spectrogram transformation logic would be applied to the raw audio stream without needing any react native or javascript Mel Spectrogram functions?

Nov 02 '22 04:11 henryleemr

@jeffxtang it depends. Generally, if it is possible in native Android and iOS, then it is theoretically possible in React Native. I did a quick check on the PyTorch Android/iOS demo apps but couldn't see any dependency to torchaudio in the app. How is the audio converted to tensors?

@raedle This script uses torchaudio to converts the audio and generates the model used in the iOS & Android apps. So internally the model uses torchaudio's transforms.MelSpectrogram to do the audio conversion. @henryleemr

Ah cool! So if I were to say, use ResNet for the spectrogram (Which is just an image), I can do something like this?
import torch
import torch.nn as nn
import torchaudio
import torchvision.models as models


class ResNet(nn.Module):
	def __init__(self, dataset, pretrained=True):
		super(ResNet, self).__init__()
		num_classes = 4

		self.transform = torchaudio.transforms.MelSpectrogram(sample_rate=16000, n_fft=400, n_mels=80, hop_length=160)
		self.model = models.resnet50(pretrained=pretrained)
		self.model.fc = nn.Linear(2048, num_classes)
		
    def forward(self, raw_audio_tensor):
        spectrogram = self.transform(raw_audio_tensor)
        output = self.model(spectrogram)
        return output
Correct me if I am wrong, once we've exported the model trained using this class into a .ptl model file, we can just use the usual react native methods to load the model and pass raw audio streams into this model and the Mel Spectrogram transformation logic would be applied to the raw audio stream without needing any react native or javascript Mel Spectrogram functions?

Based on how the model that implements the melspec transform is used in the iOS & Android apps, yes that's correct.

@hwangjeff @mthrok can you please double confirm?

Nov 02 '22 04:11 jeffxtang

yeah, the script looks good to me.

Nov 02 '22 14:11 mthrok

@jeffxtang Hello , any update in 2024 ;-), we do think consistency is critical as we trained on PC via torchaudio feature extractor (MFCC etc,.) , we do think we need the same library to do the feature extraction for other platforms like Android, iOS .. mostlikely due to the performance gap we may need try to convert pytorch models to onnx models durring deployment phase, so keep the feature extraction consistency is critical to make the model runs consistently across all platforms. Thanks alot !

Jan 13 '24 13:01 Consulting4J

audio audio copied to clipboard

torchaudio mobile?

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

audio
audio copied to clipboard