ComfyUI-DeepExtract Some details

Thanks for this node, simple to use, small model and usable.

Some details:

Please comment in the README.md where is the model to download, https://github.com/facefusion/facefusion-assets/releases/download/models-3.0.0/kim_vocal_2.onnx
Allow downloading it somewhere in ComfyUI/models/, perhaps ComfyUI/models/audio, separating code from data is important
Having dependencies with fixed versions is a very bad idea, will break something soon or latter. Copy the current requirements.txt to somethings like requirements_strict.txt and remove the versions for the audio tools (librosa and soundfile). I have soundfile 0.13.1 and librosa 0.11.0, they work without problems. Just mention to use requirements_strict.txt if something goes wrong and to report it so you can adjust the code.
Did you try converting the ONNX model to Pytorch and saving it as safetensors? If this can be achieved you can remove all the (nasty) ONNX dependencies.
Your setup.py is trying to install everything for CUDA 11.8, I'm using 12.6 and all works. I didn't even tried to run it because it could simply try to install GBs of things that I already have. This looks quite dangerous: run_pip("torch", "torchvision", "torchaudio", "--extra-index-url", "https://download.pytorch.org/whl/cu118") and I can't imagine a ComfyUI setup that doesn't have torch installed. If you take a look at ComfyUI requeriments.txt file you'll see these torch libs, trying to install them is asking for troubles.

In a nutshell: try to make the model loadable by Pytorch and this will reduce your requiements.txt to:

librosa
soundfile

Which are widely used by audio nodes. Then you can just instruct how to download the model and add an auto-downloader. With this the node can be installed fast and safe.

May 23 '25 16:05 set-soft

I played a little bit with the code:

I found you can use other MDX-Net models, found various here: https://huggingface.co/seanghay/uvr_models/tree/main
I also found that is much better to use an "instrument" model instead of using voice-instrument. But then the outputs become confusing, they should be something like "main audio" and "complement audio"
I added TQDM progress to the download, is in the dependencies, but not used
I added the option to choose the model
Removed some dead code

Do you still maintain this code?

May 24 '25 15:05 set-soft

Just in case you are interested, here is my fork: https://github.com/set-soft/ComfyUI-DeepExtract

May 24 '25 17:05 set-soft

Hi there! 🙌 Thank you so much for taking the time to check out my node and for your kind feedback. Your comments are truly valuable and motivating.

Thanks to users like you who engage and contribute, this whole process becomes even more enjoyable. When I have some free time, I’ll work on creating a more advanced and functional version of the node. I’ll definitely take your suggestions into account.

May 26 '25 13:05 abdozmantar