mumble
mumble copied to clipboard
FLAC audio for communication?
Context
Employing mumble in contexts where bandwidth is not the limiting factor, but excellent audio quality is desired.
Description
In the client configuration, a choice of codecs should be selectable. The opus codec is good of course, but since bandwidth within the scale of a mono audio signal is not so much an issue anymore, other codecs and bitrates would be fine. The FLAC codec has been around for more than 20 years and is well-proven.
Mumble component
Client
OS-specific?
No
Additional information
No response
Additional codecs add additional complexity and I'm not sure if I see enough of a reason for such a codec to be introduced.
What exactly would be the applicability of such a codec?
Well, FLAC is lossless: it would guarantee that audio data is exactly how it should be (as long as the packets are not corrupted) when reaching the destination.
The idea makes so much sense that I'm wondering if anybody came up with it already.
As for complexity: we already have support for negotiation, it's just a matter of adding the corresponding field and restore the switching logic.
What exactly would be the applicability of such a codec?
Interviews via Mumble, which are to be recorded without artifacts resulting from compression. (This is my personal one, surely more are conceivable)
As for complexity: we already have support for negotiation, it's just a matter of adding the corresponding field and restore the switching logic.
There's more to it. In the limit, every single client could have their preferred codec leading to clients no longer being able to speak to each other on a single server. More (mutually exclusive) options automatically increase fracturing of interoperability.
Personally, I have not yet been convinced that the compression actually creates (relevant) audible artifacts. Thus, I'm wondering about potential use cases where such small (as I see it) differences actually matter enough to justify the effort associated with the implementation.
The Client could be enabled to serve a basic set of codecs (say FLAC, Vorbis, Opus). All of them should be supported in send/receive mode. The choice of codecs on a server could be specified/limited in the server config file.
The problem is that not all clients will support all codecs. That's when it becomes messy. And if client A doesn't support FLAC, then client B can't use that to send audio because then client A won't hear them because they can't decode.
The problem is that not all clients will support all codecs. That's when it becomes messy. And if client A doesn't support FLAC, then client B can't use that to send audio because then client A won't hear them because they can't decode.
I apologize for my thoughtlessness - I was actually thinking that there is just one mumble client. Would it be possible to standardize a set of codecs, so other clients can adapt? (this most probably would a require longer-term process, but personally I don't think it's too difficult in technical terms)
Well, I think that almost everyone does indeed only ever use the official Mumble client, but we still have to account for the possibility of external clients. But even within the framework of official clients, you have the issue of not everyone running the same version of the client. Thus, when introducing something like this, it will take a (long) while until the majority of users has the new features available.
For this reason, you'll always have to implement some server-side logic to find a codec that all (or at least most) clients that are currently connected support.
And then there would of course also be the necessity to give every user a choice to select codecs that they deem acceptable for them. E.g. someone with a low bandwidth probably doesn't want to use FLAC, even if their client might support it. Thus, if the server simply decided upon codec availability (by client-version) this can lead to unsatisfactory results.
It's this tail of smaller complications why I am a bit hesitant to add new codecs.
Yes, I thought of that, too.
One possible solution would be: An option to launch Mumble Client with some option ticked in the configuration "enable advanced features" or "experimental features", as long as the features are newly implemented. Same on server-side, e.g. config variable set to "true" or similar. If a client connects to a server which does not support those features, he could use defaults as fallback, with notification to client. In the server list, a flag symbol or an entry in the server information could clarify if the server supports advanced features.
Still, that doesn't solve the bandwith issue, although even FLAC with a mono signal should be way less than 500kbit/s.
What is the minimum algorithmic delay that can be achieved with FLAC? I want to suggest that even if it could be implemented, the latency introduced by encoding and decoding a format intended for music archival will make having a human conversation hard, if not impossible. But I have not found any concrete numbers since noone suggests FLAC for real time audio.
flac is completely useless even for music, you can verify this by doing ABx testing for example here http://abx.digitalfeed.net/opus.html. opus at 160kbps is impossible to hear by any human ear using any audiophile equipment.
close this ticket and don't waste your time on such nonsense.
There is ample empiric evidence that differences between codecs to lossless audio or PCM are audible. While the difference may not be huge, it is there. (easy to find via google, see for example here, further research is available via AES or VdT German Tonmeister Association).
The key point by the way is not to have any audiophile quality in the actual conversation, which would most probably be a waste of bandwidth (although compared to what is provisioned to video it would be neglible IMHO). It is about further editing and processing of the recorded signal, where lossles encoding makes a huge difference to lossy formats.
close this ticket and don't waste your time on such nonsense.
While there is certainly more important things to focus on it would be a very useful feature. Implementation should not be difficult.
@Rovanion I never heard of FLAC as developed for music archival, it's a lossless codec just like 7z or gzip are lossless compression formats. I can test the processing time/power difference with ffmpeg, but don't know if that's applicable to mumble.
Testing the shortest roundtrip latency induced by passing an audio stream through PCM->FLAC and FLAC->PCM via FFmpeg could give a hint at the size of the algorithmic delay. Unless you can find other numbers this would be a start at checking whether this is feasible or not.
It is mentioned on the Xiph.org site: https://www.xiph.org/flac/developers.html
This is the conclusion from your link.
5.1 Conclusion This study tested MP3 at 128 kbps against Opus at 64 kbps in music encoded comparisons to uncompressed WAV format at 24 bit, 44.1 kHz. Results showed MP3 outperforming Opus for all song scores combined as well as in the song 8 Out Of 10, and WAV outperformed MP3 in the song Without You. For the rest of each separate songs, no statistical significance was found, indicating that subjects could not hear the difference in these comparisons. All in all, this study can be used to show what impact the SoundCloud format and bit rate change had on perceptual quality in popular music
128kbps mp3 is the edge that can really be distinguished from the original. some people hear 256kbps/original differences above the statistical error. but mono opus 160kbps is not possible.
You can convert an opus 50 times to AAC and back and you still won't hear any difference with your ears in the ABx test.
The human voice in mono is an incredibly simple audio recording sample for any modern lossy audio codec. Using opus 60kbps you will already get a recording without artifacts suitable for any post-production processing at a professional level. using opus 160kbit/s you actually get lossless. You can process, convert the result obtained as many times as you like, this will not add any artifacts audible to the human ear. Just check it using this plugin https://www.foobar2000.org/components/view/foo_abx
The human brain is a strange thing, it can hear something that is not really there. Psychoacoustics can create a placebo-like effect and then a lossy record will always look worse to you, but if you know it's a lossy.
please don't ask to degrade mumble by adding useless features. If, for some inexplicable reason, you need lossless for an interview, then simply record copies on both sides before transferring to mumble.
@hhejkhalkfahjahsf 128kbps mp3 is drastically reduced and definitely not anything close to a lossless or 16bit 44.1kHz PCM audio file. The difference might be less significant in popular music, as referred to in the publication cited, but even for this use case it did show up.
Don't get me wrong: I am aware that modern lossy codecs can handle the human voice very well in a mono speech signal. Opus at 160kbit/s certainly does, similar to e.g. ProRes 422 does with video, where artifacts are not visible. For me as a classical recording producer, it is nevertheless important to have as much resolution/data for post-processing as possible even when it is interview/voice content and FLAC does provide the most simple solution to achieve that. I don't mind you calling this feature useless - that depends on the use case - for me what counts is that any source of quality degradation is eliminated. I don't understand why mumble would be "degraded" by such a feature, anyway - that is up to developers to decide.
Recording on both sides is a good idea, but unfortunately not every interview partner will be able or willing to deal with that.
For actual compression error rates see https://www.essv.de/pdf/2016_229_236.pdf
I think this request is part of the more general request to add support for easy codec support extensions, which could then be used to support FLAC, Lyra (#5154) and potentially others.
In that context it also wouldn't be too important, how useful support of a given codec seems to be.