spleeter-web Support additional source separation models

Summary of other models

Model	Supported?	Paper	Source code	Vocals (SDR)	Drums (SDR)	Bass (SDR)	Other (SDR)	Avg (SDR)	Notes
Spleeter	Yes	Link	Yes	6.55	5.93	5.10	4.24	5.46
Demucs	Yes	Link	Yes	6.29	6.08	5.83	4.12	5.58
Conv-Tasnet	Yes	Link	Yes	6.81	6.08	5.66	4.37	5.73	Worse perceived quality than Demucs
X-UMX	Yes	Link	Yes	5.53	6.33	4.54	6.50	5.73	Slow CPU separation
D3Net	Yes	Link	Yes	7.24	7.01	5.25	4.53	6.01	Slow CPU separation
MMDenseLSTM	No	Link	Yes	6.6	6.43	5.16	4.15	5.59	No pretrained models
Meta-TasNet	No	Link	Yes	6.4	5.91	5.58	4.19	5.52	Issues with higher frequencies (sum of sources do not equal original) (https://github.com/pfnet-research/meta-tasnet/issues/4)
Nachmani et al.	No	Link	No	6.92	6.15	5.88	4.32	5.82
LaSAFT	No	Link	Yes	7.33	5.68	5.63	4.87	5.88	Looks promising! Sum of sources do not equal original (https://github.com/ws-choi/Conditioned-Source-Separation-LaSAFT/issues/3#issuecomment-750373635)

Oct 11 '20 21:10 JeffreyCA

I will prioritize adding the following models:

[x] Demucs (https://github.com/JeffreyCA/spleeter-web/pull/47)
[x] Tasnet (https://github.com/JeffreyCA/spleeter-web/pull/47)
[x] X-UMX
[ ] d3net

Dec 22 '20 03:12 JeffreyCA

Hi Jeffrey! I recommend to postpone adding LaSAFT features. We are going to re-organize the code structure, aligned with the camera-ready version of the ICASSP 2021 paper (our paper was accepted to ICASSP 2021). It might cause conflicts. We'll also upload check-points of models trained on the larger scale (n_fft of 4096; currently we only support 2048). We will finish refactoring until March. Thank you.

Feb 01 '21 02:02 ws-choi

Hi @ws-choi, thanks for the update! I meant to comment earlier but I intend to only support models where the separated sources closely add up to the original source. Will your changes help with this?

I'm not very familiar with these conferences so this is the first time hearing about ICASSP and it's being hosted in Toronto this year (although virtual)! Do you know what other conferences are there related to this research field?

Feb 01 '21 03:02 JeffreyCA

Will your changes help with this? => This time update will not support it, but future updates might. Since I have to change the overall structure of training for it, I need more time. I'll let you know if LaSAFT-Net provides such features :)

What other conferences are there related to this research field? => International Society for Music Information Retrieval (ISMIR) is the most relevant conference.

and other ML conferences such as Neurips, ICLR, ICML, AAAI, IJCAI, IJCNN and ECAI, or signal processing conferences such as ICASSP, interspeech might also include the state-of-the-art papers in this domain.

Feb 01 '21 03:02 ws-choi

Awesome, thanks!

Feb 01 '21 03:02 JeffreyCA

D3Net support is coming very soon!

Jun 21 '21 06:06 JeffreyCA

Would love to see LaSAFT!

Oct 30 '21 08:10 jacksongoode

@JeffreyCA, Could you please add support for the kuielab MDX-Net models? both leaderboard A and leaderboard B? Their best model scored a 9.00 for the SDR of vocal separation, compared to the hybrid demucs model which scored a SDR of 8.13.

Model Comparison: https://paperswithcode.com/sota/music-source-separation-on-musdb18 That list is a good reference as it lists open source models that have better than the scores that you mentioned in your original comment:

Summary of other models

Model Supported? Paper Source code Vocals (SDR) Drums (SDR) Bass (SDR) Other (SDR) Avg (SDR) Notes Spleeter Yes Link Yes 6.55 5.93 5.10 4.24 5.46 Demucs Yes Link Yes 6.29 6.08 5.83 4.12 5.58 Conv-Tasnet Yes Link Yes 6.81 6.08 5.66 4.37 5.73 Worse perceived quality than Demucs X-UMX Yes Link Yes 5.53 6.33 4.54 6.50 5.73 Slow CPU separation D3Net Yes Link Yes 7.24 7.01 5.25 4.53 6.01 Slow CPU separation MMDenseLSTM No Link Yes 6.6 6.43 5.16 4.15 5.59 No pretrained models Meta-TasNet No Link Yes 6.4 5.91 5.58 4.19 5.52 Issues with higher frequencies (sum of sources do not equal original) (pfnet-research/meta-tasnet#4) Nachmani et al. No Link No 6.92 6.15 5.88 4.32 5.82 LaSAFT No Link Yes 7.33 5.68 5.63 4.87 5.88 Looks promising! Sum of sources do not equal original (ws-choi/Conditioned-Source-Separation-LaSAFT#3 (comment))

Could you also update the demucs installer to also include their hybrid model? I've used both before and will try to help out, but I can't guarantee that I can get it integrated.

Apr 26 '22 05:04 Ma5onic

Thanks for the suggestion, I'll check that out. The latest Spleeter Web already supports Demucs v3, which is the Hybrid version. I'm always open to contributions 🙂

Apr 26 '22 06:04 JeffreyCA

Awesome! I'll look at the way you deploy your containers and try to follow the same structure.

Here is a presentation that breaks down how it works: https://ws-choi.github.io/personal/presentations/slide/2021-08-21-aicrowd

The readme was updated since I forked it: https://github.com/kuielab/mdx-net-submission/commit/80f59830c81f29b4f594a5d7b7253108e373fbc8 They finally added notes for adding custom models and it seems that someone already trained an improved version using the UVR dataset: https://github.com/kuielab/mdx-net-submission/commit/3dc5581de0a4d1b114358b09e64a2cc29a5eecc8 https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/MDX-Net-B The model achieved a 9.708 SDR score on aicrowd's private testset

Apr 26 '22 06:04 Ma5onic

It's also a bit more complex as it requires Demucs 2, and Spleeter Web uses v3.

Apr 30 '22 20:04 JeffreyCA

I'll try to get an isolated container working for the default kuielab code, then I'll see if it will work with Demucs v3 by changing the requirements.txt to the latest demucs pip release. (I highly doubt that it'll be that easy, but I'll try nonetheless) I do have hope however, because the README of demucs v3, they mention that model a couple times & make direct comparisons to it:

When trained only on MusDB HQ, Hybrid Demucs achieved a SDR of 7.33 on the MDX test set, and 8.11 dB with 200 extra training tracks. It is particularly efficient for drums and bass extraction, although KUIELAB-MDX-Net performs better for vocals and other accompaniments.

May 10 '22 20:05 Ma5onic

After further investigation, I found that mdx-net uses the Demucs v2 code but downloads the Demucs v3 model. It can be installed without conflict by using anaconda/miniconda. I just realized this, @ws-choi is one of the main contributors to that project (mdx-net).

Oct 02 '22 17:10 Ma5onic

Can we have the Spleeter model with Piano (5 stems instead of 4)?

Dec 22 '22 17:12 dts350z

@dts350z, I think that @JeffreyCA implemented the changes that you asked for: See pull https://github.com/JeffreyCA/spleeter-web/pull/458 The 5 Stem Spleeter model got merged to the main branch 😄

Jan 15 '23 22:01 Ma5onic