Some details
Thanks for this node, simple to use, small model and usable.
Some details:
- Please comment in the
README.mdwhere is the model to download, https://github.com/facefusion/facefusion-assets/releases/download/models-3.0.0/kim_vocal_2.onnx - Allow downloading it somewhere in
ComfyUI/models/, perhapsComfyUI/models/audio, separating code from data is important - Having dependencies with fixed versions is a very bad idea, will break something soon or latter. Copy the current
requirements.txtto somethings likerequirements_strict.txtand remove the versions for the audio tools (librosa and soundfile). I have soundfile 0.13.1 and librosa 0.11.0, they work without problems. Just mention to userequirements_strict.txtif something goes wrong and to report it so you can adjust the code. - Did you try converting the ONNX model to Pytorch and saving it as safetensors? If this can be achieved you can remove all the (nasty) ONNX dependencies.
- Your setup.py is trying to install everything for CUDA 11.8, I'm using 12.6 and all works. I didn't even tried to run it because it could simply try to install GBs of things that I already have. This looks quite dangerous:
run_pip("torch", "torchvision", "torchaudio", "--extra-index-url", "https://download.pytorch.org/whl/cu118")and I can't imagine a ComfyUI setup that doesn't have torch installed. If you take a look at ComfyUI requeriments.txt file you'll see these torch libs, trying to install them is asking for troubles.
In a nutshell: try to make the model loadable by Pytorch and this will reduce your requiements.txt to:
librosa
soundfile
Which are widely used by audio nodes. Then you can just instruct how to download the model and add an auto-downloader. With this the node can be installed fast and safe.
I played a little bit with the code:
- I found you can use other MDX-Net models, found various here: https://huggingface.co/seanghay/uvr_models/tree/main
- I also found that is much better to use an "instrument" model instead of using voice-instrument. But then the outputs become confusing, they should be something like "main audio" and "complement audio"
- I added TQDM progress to the download, is in the dependencies, but not used
- I added the option to choose the model
- Removed some dead code
Do you still maintain this code?
Just in case you are interested, here is my fork: https://github.com/set-soft/ComfyUI-DeepExtract
Hi there! 🙌 Thank you so much for taking the time to check out my node and for your kind feedback. Your comments are truly valuable and motivating.
Thanks to users like you who engage and contribute, this whole process becomes even more enjoyable. When I have some free time, I’ll work on creating a more advanced and functional version of the node. I’ll definitely take your suggestions into account.