JDA
JDA copied to clipboard
Why is OPUS framesize hardcoded to 20ms?
General Troubleshooting
- [x] I have checked for similar issues.
- [x] I have updated to the latest JDA version
- [x] I have checked the wiki and especially the FAQ section for similar questions.
Question
AudioSendHandler::provide20MsAudio, as the method name suggests, requests for a 20ms (in-total) OPUS packet and the DefaultSendSystem works on that hardcoded convention. The question is, why can't we send packets of varying duration? WebRTC allows 2.5ms to 60ms and Discord (or any VoIP service) is built up on it.
There is no way either to implement a custom IAudioSendSystem without doing the encryption and RTP encapsulation manually, which is cumbersome.
It is a problem for me because I am concerned with generation loss re-transcoding of a lossy codec because the packet provider already provides OPUS packets. To re-packetize, if can't then buffer and re-encode at times will complicate the process, does Discord place this restriction?
Also the provider uses direct bytebuffers but JDA requires a non-direct one, that ultimately eliminates the performance benefit bytebuffers are intended for.
Example Code
https://github.com/DV8FromTheWorld/JDA/blob/a35789442ba022c81ce9b5f62d5b6b9968e3895d/src/main/java/net/dv8tion/jda/api/audio/factory/DefaultSendSystem.java#L102-L106
What is so special about send delay being less than 60ms? What does it mean?
In the documentation of Discord (I'm not aware if anything assumed or hidden or not), however, they do not mention this frame-size limit, only that 48kHz, 2ch audio. Though due to OPUS can change it's bandwidth and number of channels at any moment the decoder is capable of resampling.
This code was written a long long time ago, however, my understanding then and now is that discord expects 20 millisecond frames of Opus. I just tested this in the client via the audio debug menu and, indeed, discord is still sending 20 milliseconds frames from the client itself.
Have you tested whether or not discord can/will accept a frame size that is different than 20 milliseconds?
This code was written a long long time ago, however, my understanding then and now is that discord expects 20 millisecond frames of Opus. I just tested this in the client via the audio debug menu and, indeed, discord is still sending 20 milliseconds frames from the client itself.
Well that's because it's a default of an OPUS encoder. It recommends to use 20ms in the specification.
Have you tested whether or not discord can/will accept a frame size that is different than 20 milliseconds?
If we presume that Discord uses the standard path (which it should), it should accept. WebRTC supports frame-sizes upto 60ms and packet sizes upto 120ms, ~~with some manageable limitations. JDA could force one frame/packet limitation and leave user to repacketize it.~~ However, I haven't tested yet because of the difficulties I've already mentioned.
I have seen a C++ library for Discord that supports what I want. Their example does obey not this restriction.
int samples = opus_packet_get_samples_per_frame(op.packet, 48000);
v->voiceclient->send_audio_opus(op.packet, op.bytes, samples / 48);
Obviously, they can't be wrong in their examples. Maintaining this 20ms limit is impossible unless re-encoded or implied to provide 20ms frames.
Possible Fix
Carry deduced duration (with opus_packet_get_samples_per_frame, also validates whether if its an OPUS packet for better debugging maybe?) with IPacketProvider and vary send delays in a DefaultSendSystem.
I've tested this the other day, and it seems like discord requires 20ms packets.
Also the provider uses direct bytebuffers but JDA requires a non-direct one, that ultimately eliminates the performance benefit bytebuffers are intended for.
This is not entirely correct. The reason we use ByteBuffer is to allow re-using the same memory for multiple packages, thus reducing allocations where possible.