audio icon indicating copy to clipboard operation
audio copied to clipboard

Loading audio file with torchaudio fails (memory crash)

Open noorbraik opened this issue 5 months ago • 1 comments

🐛 Describe the bug

When I try to load a 43-second .wav file, the memory consumption increases, which causes the session to crash. I have about 12GB of RAM. This is the piece of code that I have

from transformers import ClapProcessor, ClapModel
import torchaudio
import torch
# Setup device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load CLAP model and processor
model = ClapModel.from_pretrained("laion/clap-htsat-unfused").to(device)
processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
# Load audio file
audio, sr = torchaudio.load("/content/temp_audio_6169.wav")
# Resample to 48kHz if needed
if sr != 48000:
    audio = torchaudio.transforms.Resample(sr, 48000)(audio)
# Convert stereo to mono
if audio.shape[0] > 1:
    audio = audio.mean(dim=0)
# Limit to 10 seconds (CLAP expects max 480000 samples at 48kHz)
audio = audio[:480000]
# Process audio
inputs = processor(audios=audio, sampling_rate=48000, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}  # Move to GPU
# Extract audio embedding
with torch.no_grad():
    audio_embedding = model.get_audio_features(**inputs)
print(":loud_sound: Audio embedding shape:", audio_embedding.shape)

audio file temp_audio_6169.zip

noorbraik avatar May 25 '25 07:05 noorbraik