Conditioned-Source-Separation-LaSAFT
Conditioned-Source-Separation-LaSAFT copied to clipboard
Memory footprint in Google Colab
Thank you so much for this great model ! Wondeful job ! I have just a little question about the memory required for the separation. The model seem use a lot of memory and require to split the audio of a full song (> 1min / 1min30) in Google Colab (free version because no pro version for european users) and resample the audio to from hires (96000Hz) to lowres (44100Hz).
The current jupiter notebook show only process on very short samples (youtube video), I've slightly modify the code to allow using audio from Google Drive but seem to be limited to low resolution / short duration audio file without using splitting/merging audio subprocess. The same limitation of RAM footprint was resolved with Spleeter (Deezer) by a similar method but with some constraints (zero padding to remove in audio) (issue here : https://github.com/deezer/spleeter/issues/391#issuecomment-652202433).
Is someone already do the job?
Hi MaxC2, thanks for the feedback. As you mentioned, you have to resample the input file into 44100Hz audio file. I'll add some code lines for auto resampling later.
, but you don't have to manually split and merge audio sub-process.
When you call the separate_track
function of a pretrained model like
separated = model.separate_track(track.audio, 'vocals')
It automatically splits the given track into several sub-audio (each sub-audio has the same number of samples, and the last sub-audio is zero-padded), separates source for each sub-audio, and merges all the separated outputs to make a final audio file.
Below is the code for this.
def separate_track(self, input_signal, target) -> torch.Tensor:
import numpy as np
self.eval()
with torch.no_grad():
db = SingleTrackSet(input_signal, self.hop_length, self.num_frame)
assert target in db.source_names
separated = []
input_condition = np.array(db.source_names.index(target))
input_condition = torch.tensor(input_condition, dtype=torch.long, device=self.device).view(1)
for item in db:
separated.append(self.separate(item.unsqueeze(0).to(self.device), input_condition)[0]
[self.trim_length:-self.trim_length].detach().cpu().numpy())
separated = np.concatenate(separated, axis=0)
import soundfile
soundfile.write('temp.wav', separated, 44100)
return soundfile.read('temp.wav')[0]
The pytorch dataset API SingleTrackSet
automatically splits the given track, in an on-the-fly manner.
After iterating every sub-audio file, separate_track
merges all outputs by separated = np.concatenate(separated, axis=0)
Thank you.
Yes, it's maybe because i've attempt to use to load 96KHz audio with librosa (sr=96000
) before calling separate_track
and get a kick out from Google Colab out of RAM. I have retry with a 44.1KHz cutted at ~1min30. So now, I will test will a full song resampled at the right sample rate. Thank you very much for your support, and once again well done for your great model !
OK, I've done some test. The problem come from the use of embedded audio player display(Audio(audio, rate=rate))
that seem duplicate the audio in some manner and use a lot of RAM. So for a big audio file (ex. more 10 minutes) you're always kicked out from Google Colab out of RAM limits.
To do the trick, the idea is not to use the embbed audio preview and directly call the separate_track
process.
In a draft form, for using my audio stored in Google Drive, i've write two new cells. The first one is the common Google Drive mount:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
The second load any audio file, resample (best filter quality) and convert to stereo if needed. Each processed temp.wav are renamed and writed in a destination subfolder in Google Drive (separated
for my case) in order to facilitate the download (zip file).
import os
import shutil
import librosa
import resampy
gcolab_root = '/content/Conditioned-Source-Separation-LaSAFT/'
gdrive_root = '/content/gdrive/My Drive/'
destination_folder = 'separated'
default_sample_rate = 44100
sources = ['vocals', 'drums', 'bass', 'other']
def load_audio(audio_path):
audio, rate = librosa.load(audio_path, sr=None, mono=False)
if rate != default_sample_rate:
audio = resampy.resample(audio, rate, default_sample_rate, filter='kaiser_best')
is_mono = audio.ndim == 1
if is_mono:
audio = np.asfortranarray(np.array([audio, audio]))
return audio, rate, is_mono
def separate_all_sources(audio, gdrive_path):
for src in sources:
print("separate '%s'" %src)
model.separate_track(audio.T, src)
shutil.copy(os.path.join(gcolab_root, 'temp.wav'),
os.path.join(gdrive_path, src + '.wav'))
# prepare google drive destination folder
path = os.path.join(gdrive_root, destination_folder)
try:
os.makedirs(path, exist_ok = True)
except OSError as error:
print("Directory '%s' can not be created" %path)
print('load audio source')
audio_file = os.path.join(gdrive_root, 'audio/stairway/center.flac')
audio, rate, is_mono = load_audio(audio_file)
separate_all_sources(audio, path)
print('finished')
I need to add a fallback to audio format:
- back to mono
- back to original sample rate
But for the moment this process work great on my audio files (>10mn, 96kbits 24 bits) without offline preprocessing.
Maybe the idea will be to add a extra method in the python code that do not write the test.wav
in the root project folder, but a named .wav (vocals / drums / bass / other) in a project temporary subfolder (separated for example). And do a zip over the folder with a download link in Google Colab after the separation.
That can help some potential users that do not have a Google Drive account.
For me, it's OK and fine. Thank you very much.
Thank you for sharing your experience. I'll update the code to reflect what you've recommended, sooner or later 👍