discord-api-docs
discord-api-docs copied to clipboard
aead_aes256_gcm encryption modes
Description
The documentation is missing information about the aead_aes256_gcm encryption modes. There was an issue opened about this before but it was closed as these modes were "experimental". The desktop client has been using these modes for a while now so I assume they aren't experimental anymore. Using AES256 encryption could speed up cryptography in bots significantly as many processors have acceleration for it.
Steps to Reproduce
Read the docs https://discord.com/developers/docs/topics/voice-connections#establishing-a-voice-udp-connection-encryption-modes
Expected Behavior
The table lists the aead_aes256_gcm based modes as well.
Current Behavior
It only lists the xsalsa20_poly1305 based modes.
Screenshots/Videos
No response
Client and System Information
N/A
More specifically this is the Ready payload that I receive:
{
"op": 2,
"d": {
"streams": [
{
"type": "video",
"ssrc": 256235,
"rtx_ssrc": 256236,
"rid": "",
"quality": 0,
"active": false
}
],
"ssrc": 256234,
"port": 50010,
"modes": [
"aead_aes256_gcm_rtpsize",
"aead_aes256_gcm",
"aead_xchacha20_poly1305_rtpsize",
"xsalsa20_poly1305_lite_rtpsize",
"xsalsa20_poly1305_lite",
"xsalsa20_poly1305_suffix",
"xsalsa20_poly1305"
],
"ip": "redacted",
"experiments": [
"fixed_keyframe_interval"
]
}
}
If we could get documentation on all modes, that'd be great. The currently undocumented modes are:
-
aead_aes256_gcm_rtpsize
-
aead_aes256_gcm
-
aead_xchacha20_poly1305_rtpsize
-
xsalsa20_poly1305_lite_rtpsize
I began work around a week ago mapping these encryption modes and voice API v7. I have not yet had success using any of the aead
-labeled modes (technically xsalsa20
is also an aead mode, though) as I assume Discord is passing something in the AAD field during encryption. Referencing RFC 7714 section 8.2, it looks like the AAD should be a repeat of the header we generate for the payload, but that didn't work for me during testing, so I could be entirely wrong. I was able to both encrypt and decrypt arbitrary data, and I assume the RTP header hasn't changed, and working under the assumption the new ciphers use the same nonce generation as xsalsa20_poly1305_lite
(xsalsa20_poly1305_lite_rtpsize
does btw -- and RFC standard suggest the others should as well), the only field I haven't addressed would be the optional AAD field. It doesn't look like _rtpsize
means anything, as far as I've been able to tell, but I've also not been able to find reference to it in any RFC standard or cipher modes, so it may be left as an internal note.
Given work on voice was on a backburner as of issue #2125, it's possible work was never finalized in some part, or release to the public wasn't priority as xsalsa20
support may be around for quite some time, and the cipher used for VC encryption doesn't necessarily matter to a point as xsalsa20
hasn't been cracked yet. Obsolete certainly, but not broken.
I've plans this week to reach out to someone on the voice team, if I can find one that's willing to speak, for clarification of what they're expecting to receive in the AAD field. If I'm correct, and this is what I am missing, I should have all aead
modes working with my library once I know what's expected. I'm primarily interested in xchacha20_poly1305
encryption as it is the successor to the obsolete and aging xsalsa20_poly1305
cipher. I'd ask other library maintainers, but it doesn't seem like much is going on in the exploratory field, and haven't had luck receiving answers to my other queries, so I've been on my own with my faithful team for the most part.
My plan is/was to address all features my library (and others) lack, primarily voice API v7, webRTC support (which requires no extra encryption), and support for the new ciphers, then update Discord's dated voice documentation with my findings. It does seem, though, that I've gotten the furthest in my endeavors, as I've been unable to find reference to these ciphers in a working state in any other place.
Pehaps it's a placebo effect, or I fixed a bug in my library while testing, but voice API v7 sounds great compared to my experience with v4. Always had an issue with artifacting in my audio stream, which does not exist in v7.
I'll update you with my findings when I have more to work with. I've also sent some of my findings to the userdoccers documentation.
So TL;DR, operationally xsalsa20_poly1305_lite_rtpsize
is the same as xsalsa20_poly1305_lite
, and I'm missing the final piece to get the other modes working. Reaching out directly this week for more info.
Hello,
Thanks for reaching out and for the detailed review. I'm a software engineer on the team responsible for these encryption modes.
We do intend to document these soon. We are currently phasing out some older modes and will likely not document deprecated modes. The two I expect we will document are aead_aes256_gcm_rtpsize
and aead_xchacha20_poly1305_rtpsize
. Because we know that changing encryption modes is likely to be significant work, we wanted to avoid asking devs to change modes repeatedly if it could be avoided, but it looks like we have likely settled on these modes for the foreseeable future. I don't have an exact date of when to expect updated documentation but it is something we are aware of. The existing modes will continue to function for now.
Regarding voice websocket versioning
Pehaps it's a placebo effect, or I fixed a bug in my library while testing, but voice API v7 sounds great compared to my experience with v4. Always had an issue with artifacting in my audio stream, which does not exist in v7.
The versioning is purely limited to the control plane and will not have any effect on the data plane/rtp streams. There is also actually very little difference between v4 and v7. I've briefly reviewed the changes and it looks like all changes relate either to video or to internal instrumentation. I suspect there is likely nothing relevant here for bot devs but we will review again when we release documentation.
You are amazing, thank you for communicating with us.
Thank you Brian for the prompt update. Is there any chance of getting a brief idea of what's expected in the aead
payloads ahead of time, or is it a simple "wait for the documentation update?" My hope was to make an attempt at documenting the new modes in a PR (though seems you guys may already be on that), but I'm missing something that the RFC doesn't lend solid definition to. I'm thinking it's something in the AAD field, as I'm providing the header, nonce, and encrypted data, but haven't figured out what it is. RFC 7714 suggested it can be the contents of the header a second time, but that didn't work in my testing.
I'm also curious if there are plans to document connecting with webRTC and supported video modes, if any. I know of a few meeting recording/sharing bots that would love the ability to decode video data if they knew what layout the data's in.
I expect it'll be a patience/"wait and see" response, but figured I'd ask.
Thanks again for the update. Let's hope things move along smoothly.
Hello, are there any updates on the timeline for this? Thank you.
https://git.kaydax.xyz/w/algos/src/branch/main/doc/crypt.md
Thanks @Zipdox2. My work here is complete. https://github.com/elderlabs/BetterDisco/commit/d988d6a8
EDIT: for clarity, and to answer my post above from a year ago, we were nearly there, missing one key detail -- the nonce size. AES256-GCM
supports a nonce of 12 Bytes, whereas all other new modes support 24 Bytes. All new modes follow the same nonce format as xsalsa20_poly1305_lite
. xchacha20
and AES256-GCM
contain an additional data field (aad), which is the full RTP header, as noted above from RFC 7714 section 8.2
. I would expect a number of formats to be deprecated at some point in the future, particularly most of the xsalsa20
modes for the reasons I noted above.
The documentation has been updated will all current encryption modes