music_source_sepearation_SH_net icon indicating copy to clipboard operation
music_source_sepearation_SH_net copied to clipboard

How to separate a track

Open LennyPenny opened this issue 6 years ago • 6 comments

Hey, thanks for the awesome work! So I got the checkpoints and added their path to the config, but how do I actually do separation on a new wav file? The eval_dsd_100.py file only seems to iterate over the dsd100 dataset.

Is there a function that that I could use that I'm missing? Maybe you have some extra code to do this that is missing from the repo

LennyPenny avatar Nov 04 '18 21:11 LennyPenny

Hi Lenny, You can refer to the evaluation code to perform separation task on your own wav files. You can use librosa.load function to load wav files, and most part of the evaluation code can be used without changes. It may not so difficult to modify the code for your own task. Feel free to ask if you have further questions.

sungheonpark avatar Nov 07 '18 11:11 sungheonpark

Okay I got it working now! It's really good:) Maybe I will make a PR with an easy to use command line script.

Do you have any pointers as to how I could turn this into a real time separator? Like for a radio stream or something

LennyPenny avatar Nov 07 '18 23:11 LennyPenny

I don't have much experience about getting the live streaming data. You may find some example python codes dealing with live streaming from the Internet.

sungheonpark avatar Nov 15 '18 09:11 sungheonpark

Another question before I dive into this too deeply: Does it run the neural network for each sample of input (i.e. 44.1k times a second for a 44.1khz audio file), or is the whole file given the neural network at once?

If the former is true, then getting this to work with live streaming data would be easily possible,

LennyPenny avatar Nov 29 '18 21:11 LennyPenny

When the spectrogram is fed into the network, it is divided into smaller chunks. The spectrogram of a single file has size of 512 x (length of the spectrogram). The input size of CNN is 512x64, so the spectrogram is cropped to fit the size of the network. In the training, input spectrogram is cropped to 512x64 at random, and during the test, the spectrogram is sequentially divided and fed into the network.

sungheonpark avatar Nov 30 '18 04:11 sungheonpark

@LennyPenny Can you please tell me how actually you change the code to separate the track in test .wav file

pratikshaya avatar May 02 '20 05:05 pratikshaya