amazon-kinesis-video-streams-webrtc-sdk-c
amazon-kinesis-video-streams-webrtc-sdk-c copied to clipboard
[BUG]Consistent problems in connecting to the streams
Logging Please see attached logs from different devices,
Describe the bug After 10s or 100s of connections , our devices randomly get into a state that no new stream connection can be established successfully. The issue resolves only after restarting the software.
SDK version number release v.1..7.1
To Reproduce We have approximately 20 devices in the field and approximately 20 different people connect to them for around 100 times per day. Issue occurs randomly for different viewers and different devices and does not resolve on its own.
Expected behavior We expect reliable connections
Device:
- OS: Ubuntu 18.0.4 (NVidia Jetpack 32.5.2)
- Browser N/A - C SDK
Viewers:
- OS: Windows 10
- Browser: Chrome latest version (issue is happening in different Chrom versions and also with Firefox)
same issue we are getting in our field devices after taking latest TAG 1.7.0
It looks like this issue has not been active for a long time. If the issue is not resolved, please add an update to the ticket, else it will be automatically resolved in a few days.
@tuncalie ,
Based on a cursory look at the log (device1-6wdm.txt), I see this error when the candidate pair selection ultimately fails - 0x5a000020
. This indicates that one of the TURN states is timing out (the timeout is set to 5 seconds by default) and based on the logs, it looks like getting the credentials is what is timing out. I see this repeatedly, and when this happens, it looks like ICE Agent fails as well. Is the application set to tear down and retry? Is there any information you could provide about how your application deviates from the sample that could be helpful in debugging this? In the meantime, I will also need to deep dive into the log and code to understand what could be going on. Will get back on this.
I see the same. I think the immediate issue is that the UDP protocol is blocked by the firewall, but I think there's an underlying bug as well. Essentially, the TURN_STATE_GET_CREDENTIALS
state is only exited upon either "obtaining credentials" or reaching a timeout (5 seconds):
src/source/Ice/TurnConnection.c:
871 STATUS turnConnectionStepState(PTurnConnection pTurnConnection)
872 {
...
914 case TURN_STATE_GET_CREDENTIALS:
915
916 if (pTurnConnection->credentialObtained) {
917 DLOGV("Updated turn allocation request credential after receiving 401");
918
919 // update turn allocation packet with credentials
920 CHK_STATUS(freeStunPacket(&pTurnConnection->pTurnPacket));
921 CHK_STATUS(turnConnectionGetLongTermKey(pTurnConnection->turnServer.username, pTurnConnection->turnRealm,
922 pTurnConnection->turnServer.credential, pTurnConnection->longTermKey,
923 SIZEOF(pTurnConnection->longTermKey)));
924 CHK_STATUS(turnConnectionPackageTurnAllocationRequest(pTurnConnection->turnServer.username, pTurnConnection->turnRealm,
925 pTurnConnection->turnNonce, pTurnConnection->nonceLen,
926 DEFAULT_TURN_ALLOCATION_LIFETIME_SECONDS, &pTurnConnection->pTurnPacket));
927
928 pTurnConnection->state = TURN_STATE_ALLOCATION;
929 pTurnConnection->stateTimeoutTime = currentTime + DEFAULT_TURN_ALLOCATION_TIMEOUT;
930 } else {
931 CHK(currentTime < pTurnConnection->stateTimeoutTime, STATUS_TURN_CONNECTION_STATE_TRANSITION_TIMEOUT);
932 }
933 break;
pTurnConnection->credentialObtained
is only set to true
upon receiving a "401 Unauthorized" STUN packet in response to a "Create Allocation" request (simply indicating that the Create Allocation request should be retried with credentials:
src/source/Ice/TurnConnection.c:
334 STATUS turnConnectionHandleStunError(PTurnConnection pTurnConnection, PBYTE pBuffer, UINT32 bufferLen)
335 {
...
377 switch (pStunAttributeErrorCode->errorCode) {
378 case STUN_ERROR_UNAUTHORIZED:
...
396 pTurnConnection->credentialObtained = TRUE;
The fact that the "401 Unauthorized" message is never received implies UDP is blocked.
However, the underlying question is why is TCP TURN not used?
This is a very old issue. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.
Closing this since there is no update on this for a while. Feel free to reopen if there is more information the team could use to debug.