f2e-spec
f2e-spec copied to clipboard
Check that singing matches lyrics by integrating speech recognition
Suggestion
Integrate speech recognition to verify the lyrics are actually being sang. Speech recognition is not very reliable so if like 50% of the words match the user is probably singing fine. But if not, there should be some punishment (point loss, or a warning in gameplay, or a warning in evaluation screen)
Use case
As a karaoke newcomer, playing Performous at a friend's house, I noticed I sometimes couldn't keep up with reading the lyrics so I involuntarily resorted to a mixture of humming and incomprehensible blabbering. This feels like cheating but it's kinda hard for me to stop because I'd be punishing myself, moving focus from the melody to the lyrics, since Performous only looks at melody currently
Extra info/examples/attachments
As mentioned, I'm very new to Performous and karaoke in general. So bear with me in case this is a silly suggestion
Speech recognition is pretty computational heavy not sure if this can be achieved realtime Do you have perhaps any libraries you know works well for speech recognition?
I do like the idea and thinking outside the box for solving the "humming along the melody" problem!
This does not necessarily need to be done in real-time, or at least not in the immediate.
The app could "record" the voice of the player. At the end of the song, the app could speech to text it.
Then calculate the Levenshtein distance between the speech-to-text version and the lyrics as provided (which would need a bit of pre-processing). Alternatively, lyrics can be separated from the the music at that point and speech-to-texted as well, then both versions diffed.
That will provide a value which is an indication of how close the singer matched the lyrics.
BTW, have a look at this (example K-Pop) : https://openai.com/blog/whisper/
Real-time scoring would be nicer of course, but not sure if this is achievable now.
Another possibility is to use a splitter on the song, maybe at load time, or offline, isolating voice and music, and changing completely the app engine scoring to match the voice track. That would allow to play whatever song, regardless if they have a lyrics file, and probably also alleviate the sing along issue.