f2e-spec
f2e-spec copied to clipboard
Karaoke separation: Integrate demucs or similar
I've played around a bit with the demucs separator, and am baffled by how well it automatically separates vocals from the rest of an audio file(or also into drums, bass, other and vocals). Sure, it's not the same as having the original audio's separate parts, but it's a usably good approximation.
I'm currently experimenting with using this in combination with performous, and open this issue for two purposes:
- To document that and how to do it, possibly with snippets of invocations, and
- to explore how this might be used inside performous -- rough ideas are:
- songs could be made into karaoke versions
- the extracted drum track could be useful for creating drum tracks, and
- if there is (or gets created, eg. using alass) any auto-alignment for text to songs, the extracted voice stem would be helpful in creating that alignment.
It might well be that there are competing algorithms to demucs that outperform it, but findings likely apply in a similar fashion to others.
One thing to be aware of is the time factor: Splitting a song (on my laptop where I don't use GPU acceleration) takes roughly twice the time the song takes to play, and is not really streaming friendly (there is some chunking, but it's like 4 chunks per song).
There's also a very nice project called spleeter to get an mp3 track splitted into the separate instrumental tracks. It uses its machine learning and has a great model to filter out the voice or other instruments. It does it job very well
I think @Baklap4 means this project https://github.com/deezer/spleeter
There are more tools out there, f.e.:
- https://github.com/NotVinay/karaokey
@chrysn I think you can do the "To document that and how to do it, possibly with snippets of invocations" yourself at a (maybe new) wiki page. Examples and hints how to work with demucs would be great. Other tools may follow.
For the second point (integrating support into performous) I think those tools should be used "offline":
- Create audio files without vocal (xyz.novocals.mp3)
- Use that file if present
- Add configuration (use files with vocals, prefer files without vocal, ask)
Started wiki page for separation tools: Separation tools (wiki)
I have a different idea but I'm not sure if it there'll an issue regarding IP or something.
Using machine learning to separate the vocals or even recreate the instruments from online services like Youtube Music and Spotify. I tested some ML projects that can run on local machine, and some have very good responses. This kind of plug-in would definitely help anyone to sing any song they can think of on the fly.
I used to play this game in early 2010s, and managing a huge amount of library of songs I won't play most of the time is quite a pain. And if I have a visitor and want to sing some of the newer song, I have to find karaoke version of it and its music files first which kills the mood.