DTLS fragment exceeds MTU size if the certificate is large.
Despite mediasoup setting the DTLS MTU using SSL_set_mtu and DTLS_set_link_mtu, if the certificate is large, the UDP packet may still exceed the MTU limit.
Reproduce the bug by:
git clone https://github.com/winlinvip/mediasoup-demo.git &&
cd mediasoup-demo && git checkout bug/dtls-for-large-cert
Note: We modify the server.js to support DTLS certificate, see https://github.com/winlinvip/mediasoup-demo/commit/5b6e87514ff1a7ad307184a5bc2a894b61944f34
Generate a large certificate:
mkdir -p server/certs &&
openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:4096 \
-keyout server/certs/privkey.pem -out server/certs/fullchain.pem
Note: Just for the purpose of reproducing the issue, a 4096-bit certificate was used. Even if you use a 1024-bit certificate, you can still add a lot of additional information.
Run server:
cd server
npm install && cp config.example.js config.js
npm start
Run app:
cd app
npm install --legacy-peer-deps
npm start
The number 4 packet in the below picture shows that the DTLS packet is 2008 bytes, which exceeds the UDP MTU.
Despite mediasoup setting the DTLS MTU using SSL_set_mtu and DTLS_set_link_mtu
And what else can we do if OpenSSL doesn't work as expected? There is no way for us to limit the DTLS message size other than calling that OpenSSL API.
This issue is caused by the use of BIO_get_mem_data. Instead, you can use the bio callback function BIO_set_callback. We also encoutered this problem and have submitted a detailed PR to fix this bug in SRS. If you're interested, you can see it for more information.
Thanks for the info. BTW are you willing to create a PR for mediasoup as well? If not we'll do but not sure when, we have lot of pending work in different PRs.
Although I would like to create a PR, I'm currently swamped wth work. We're focusing on adding WHIP support to FFmpeg, and we have already made it work with Janus, pion, and SRS. However, I'm also interested in makeing it work with mediasoup.
Once I'm done with the FFmpeg WHIP patch and successfully make it work with mediasoup, I may have some spare time to create a PR for this issue.
Even though I'm the maintainer of SRS, a open source WebRTC SFU server, I have strong admiration for mediasoup. 😄
Thanks a lot. As said, if you cannot I'll jump into this eventually.
@ibc I've been able to reproduce it using the initial comment. I can take a look at creating a PR, if no one else has had the time yet?
@ibc I've been able to reproduce it using the initial comment. I can take a look at creating a PR, if no one else has had the time yet?
We have lot of pending work to do so I'd really appreciate if you could write that PR. Thanks a lot.
Long time fixed, closing.
I fail to understand why this issue didn't auto-close. It was fixed by PR https://github.com/versatica/mediasoup/pull/1156 (that targets flatbuffers branch). Once flatbuffers branch was merged into master this issue should have been auto-closed... Anyway.
Once flatbuffers branch was merged into master this issue should have been auto-closed...
No, it doesn't inherit "Closing" property that way
Once flatbuffers branch was merged into master this issue should have been auto-closed...
No, it doesn't inherit "Closing" property that way
So just when the PR targets main branch... ok
Reopening this issue since the PR that fixed it has been reverted: https://github.com/versatica/mediasoup/pull/1342
An easy way to check the issue without requiring Wireshark:
diff --git a/worker/src/RTC/DtlsTransport.cpp b/worker/src/RTC/DtlsTransport.cpp
index 3bb8a9657..52d1df301 100644
--- a/worker/src/RTC/DtlsTransport.cpp
+++ b/worker/src/RTC/DtlsTransport.cpp
@@ -1201,6 +1201,11 @@ namespace RTC
MS_DEBUG_DEV("%" PRIu64 " bytes of DTLS data ready to sent to the peer", read);
+ if (read > DtlsMtu)
+ {
+ MS_DUMP("--- WARNING: sending data with length %" PRIi64 " bytes > DTMS MTU %i", read, DtlsMtu);
+ }
+
// Notify the listener.
this->listener->OnDtlsTransportSendData(
this, reinterpret_cast<uint8_t*>(data), static_cast<size_t>(read));
diff --git a/worker/src/RTC/WebRtcTransport.cpp b/worker/src/RTC/WebRtcTransport.cpp
index a5deb26fe..73e867ebb 100644
--- a/worker/src/RTC/WebRtcTransport.cpp
+++ b/worker/src/RTC/WebRtcTransport.cpp
@@ -1366,6 +1366,11 @@ namespace RTC
return;
}
+ if (len > 1350)
+ {
+ MS_DUMP("--- WARNING: sending data with length %zu bytes > DTMS MTU 1350", len);
+ }
+
this->iceServer->GetSelectedTuple()->Send(data, len);
// Increase send transmission.
With a 16384 bits certificate it prints:
RTC::DtlsTransport::SendPendingOutgoingDtlsData() | --- WARNING: sending data with length 6987 bytes > DTMS MTU 1350
RTC::WebRtcTransport::OnDtlsTransportSendData() | --- WARNING: sending data with length 6987 bytes > DTMS MTU 1350
Something interesting:
- With original (current v3 branch)
DtlsTransport.cppcode I see:
RTC::DtlsTransport::SendPendingOutgoingDtlsData() | --- WARNING: sending data with length 6987 bytes > DTMS MTU 1350
RTC::WebRtcTransport::OnDtlsTransportSendData() | --- WARNING: sending data with length 6987 bytes > DTMS MTU 1350
- If I remove
SSL_OP_NO_QUERY_MTUinSSL_CTX_set_options()then I see:
RTC::DtlsTransport::SendPendingOutgoingDtlsData() | --- WARNING: sending data with length 7587 bytes > DTMS MTU 1350
RTC::WebRtcTransport::OnDtlsTransportSendData() | --- WARNING: sending data with length 7587 bytes > DTMS MTU 1350
However I see something more interesting in Wireshark:
As you can see, the DTLS Server Hello packet is indeed fragmented into chunk that do not exceed 1350 (DtlsMtu size used in DtlsTransport.cpp). The problem is that all those fragments are sent together into a single UDP datagram that is sent into a single IP packet, so no IP fragmentation is happening and hence we end with a single non-fragmented IP packet of 7015 bytes that contains a single non-fragmented UDP packet of 6995 bytes that contains a DTLS Server Hello packet fragmented into many chunks.
So to be clear: what do we want here? How should it be?
In DtlsTransport(), when it comes to send DTLS data to the client, we call SendPendingOutgoingDtlsData() which does this:
char* data{ nullptr };
const int64_t read = BIO_get_mem_data(this->sslBioToNetwork, &data);
And, as exposed above, the problem is that in the scenario above with a huge certificate, such a obtained read is way higher than DtlsMtu (1350 bytes). So is this already the problem?
In that method, we send all those bytes to this->listener->OnDtlsTransportSendData() which will basically send them into a single big UDP datagram.
I've tried to split that big data into chunks of max DtlsMtu so separate UDP datagrams are sent:
DtlsTransport::SendPendingOutgoingDtlsData() | sending data with length 7562 bytes > DTMS MTU 1350 bytes, spliting it
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:0, len:1350]
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:1350, len:1350]
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:2700, len:1350]
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:4050, len:1350]
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:5400, len:1350]
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:6750, len:812]
But of course it doesn't work because there is no valid DTLS packets into those UDP packets XD.
So AFAIU the point here:
read = BIO_get_mem_data(...)reports us 7562 bytes of DTLS data to be sent.- We know that those 7562 bytes contain N fragments of a DTLS packet (in this case Server Hello).
- But we don't know how long is each fragment so we cannot read and send them separately. Is this the problem?
I'm discarding PR https://github.com/versatica/mediasoup/pull/1343 because its solution is too risky and it is based on non documented OpenSSL behavior that may break in future versions of OpenSSL, plus there is a scenario in which the logic implemented in that PR would be wrong anyway.
For future works on this issue (such as writing a custom bio/buffer in which OpenSSL writes so we can frame each written message), let's take the feedback, ideas and discussion in that PR https://github.com/versatica/mediasoup/pull/1343.
Some of the useful feedback (about a potential reliable solution) in that closed PR is this one:
The previous fix intercepted the write operations, and when implemented carefully can be made correct. It just needs to clear the buffer after the intercepted write.
An alternative solution is to implement our own BIO, which OpenSSL will then will call 'write' on. I did find some recommendations on the internet not to implement your own, but it merits some investigation. A big advantage would be is that we lose a memory copy, and we can take the data directly from the buffer OpenSSL uses and send it out over the network. Performance improvements will probably be slim, but it may save some space in the processor cache. It also doesn't use anything in a way it was not designed for.
@gkrol-katmai, I may start working on a reliable solution not sure when, if you wish, please subscribe to this issue so you get notified.
@ibc All right! I'm subscribed, and feel free to tag me whenever you want.