libdatachannel icon indicating copy to clipboard operation
libdatachannel copied to clipboard

No Audio Output

Open Laky-64 opened this issue 1 year ago • 18 comments

I've been working on migrating from Node.js to C++ with the need migrate pytgcalls (The Node.js core) to C++. As part of this migration, I want to ensure support for PCM16LE audio data.

I've noticed that on the Node.js side, we use PCM16LE for audio, but on the C++ side, the code currently seems to use the Opus codec. As a result, when I tried streaming audio through libdatachannel, I encountered issues with audio not being properly transmitted on Telegram.

I'm reaching out to see if it's possible to add support for PCM16LE audio data in ntgcalls with libdatachannel. I see that you're using libdatachannel on the C++ side, and I believe that adding support for PCM16LE could help resolve the audio issues I'm facing.

Is there any chance you could guide me on how to modify the C++ code to handle PCM16LE audio properly? I'm not sure how to convert the PCM16LE data to a suitable format for libdatachannel transmission.

Here are the relevant code sections for reference:

C++ Code: link to MediaStreamTrack.cpp Node.js Code (Old Code): link to stream.ts

If you could provide some guidance or suggestions, I'd greatly appreciate it. I'm here to provide any additional details you may need.

Thank you so much for your time and support!

Laky-64 avatar Aug 05 '23 18:08 Laky-64

I think this is a misunderstanding. The Node.js code you link passes audio as PCM to node-webrtc (wrtc), but it will be encoded internally to Opus, which is the standard audio codec for WebRTC. Actually, the SDP code expects WebRTC to always negotiate the Opus codec for audio as it is hardcoded there. The remote peer will only ever see incoming Opus-encoded audio.

While libwebrtc, which you import via the node-webrtc wrapper, handles encoding/decoding internally for you, libdatachannel is lower level and implements only the network part, encoding/decoding must be done externally and audio and video must be sent in encoded format. It makes the library way lighter and in some cases it prevents the costly transcoding process happening when you decode media to feed in to libwebrtc which will then reencode.

paullouisageneau avatar Aug 05 '23 21:08 paullouisageneau

So @paullouisageneau basically a Telegram with libwebrtc (Node.js) comes in Opus, right?

If yes, then I don't understand why it can't transmit audio, since I tried to use it on the browser, but nothing arrives on telegram, maybe I can't build the remote description by hand?

The building Sdp part

https://github.com/pytgcalls/ntgcalls/blob/master/src/webrtc/SdpBuilder.cpp

This is what I need to send to Telegram (The setup need to be always active):

{
  "fingerprints": [
    {
      "fingerprint": "16:B0:C7:56:25:54:69:7C:79:A4:B0:3D:77:56:83:0B:A4:57:E8:2D:6D:9E:D8:1A:7F:FF:D1:91:B4:BE:99:06",
      "hash": "sha-256",
      "setup": "active"
    }
  ],
  "pwd": "yGS9FiEPpp4/CvloaKimzC",
  "ssrc": 2,
  "ufrag": "XUKA"
}

What I get after sending theese parameters

{
  "transport": {
    "candidates": [
      {
        "generation": "0",
        "component": "1",
        "protocol": "udp",
        "port": "32001",
        "ip": "2001:67c:4e8:f102:6:0:285:202",
        "foundation": "1",
        "id": "6c95c57726a8e7f803cf6ccba",
        "priority": "2130706431",
        "type": "host",
        "network": "0"
      },
      {
        "generation": "0",
        "component": "1",
        "protocol": "udp",
        "port": "32001",
        "ip": "91.108.9.98",
        "foundation": "2",
        "id": "e1b543f26a8e7f8072eedc9c",
        "priority": "2130706431",
        "type": "host",
        "network": "0"
      }
    ],
    "xmlns": "urn:xmpp:jingle:transports:ice-udp:1",
    "ufrag": "20t2i1h6od6f5q",
    "rtcp-mux": true,
    "pwd": "3ei2ejc1nu0mu0rg520cpdcvu9",
    "fingerprints": [
      {
        "fingerprint": "23:87:55:CD:39:35:F2:13:62:B8:1B:4F:21:B6:17:A3:C5:A6:84:39:59:96:98:DB:4A:A0:73:35:95:B1:68:6E",
        "setup": "passive",
        "hash": "sha-256"
      }
    ]
  },
  "audio": {
    "payload-types": [
      {
        "id": 111,
        "name": "opus",
        "clockrate": 48000,
        "channels": 2,
        "parameters": {
          "minptime": 10,
          "useinbandfec": 1
        },
        "rtcp-fbs": [
          {
            "type": "transport-cc"
          }
        ]
      },
      {
        "id": 126,
        "name": "telephone-event",
        "clockrate": 8000,
        "channels": 1
      }
    ],
    "rtp-hdrexts": [
      {
        "id": 1,
        "uri": "urn:ietf:params:rtp-hdrext:ssrc-audio-level"
      },
      {
        "id": 2,
        "uri": "http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time"
      },
      {
        "id": 3,
        "uri": "http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01"
      }
    ]
  }
}

Laky-64 avatar Aug 06 '23 13:08 Laky-64

So @paullouisageneau basically a Telegram with libwebrtc (Node.js) comes in Opus, right?

Yes in both cases audio is sent encoded as Opus.

If yes, then I don't understand why it can't transmit audio, since I tried to use it on the browser, but nothing arrives on telegram, maybe I can't build the remote description by hand?

Thank you for the example messages, could you please also give SDP offer from libdatachannel and reconstructed answer?

paullouisageneau avatar Aug 07 '23 21:08 paullouisageneau

Thanks @paullouisageneau for the answering before of all, than here to you the SDP Offer and the Reconstructed answer

SDP Offer

v=0
o=rtc 2841695458 0 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio3963038161
a=group:LS audio3963038161
a=msid-semantic:WMS *
a=setup:actpass
a=ice-ufrag:+/VY
a=ice-pwd:O8lXCUo5IlCVeO55VNkrCw
a=ice-options:ice2,trickle
a=fingerprint:sha-256 93:DE:62:79:42:99:B3:37:31:3B:89:FC:04:85:AE:02:62:14:85:F1:18:80:F5:8E:F9:41:0A:DB:9D:2F:94:70
m=audio 51709 UDP/TLS/RTP/SAVPF 111
c=IN IP4 192.168.193.1
a=mid:audio3963038161
a=sendonly
a=ssrc:3963038161 cname:audio3963038161
a=ssrc:3963038161 msid:stream3963038161 audio3963038161
a=msid:stream3963038161 audio3963038161
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;maxaveragebitrate=96000;stereo=1;sprop-stereo=1;useinbandfec=1
a=candidate:1 1 UDP 2122317823 192.168.193.1 51709 typ host
a=candidate:2 1 UDP 2122317567 192.168.65.1 51709 typ host
a=candidate:3 1 UDP 2122317311 172.21.96.1 51709 typ host
a=candidate:4 1 UDP 2122317055 172.30.128.1 51709 typ host
a=candidate:5 1 UDP 2122316799 192.168.1.11 51709 typ host
a=end-of-candidates

Reconstructed answer

v=0
o=- 1691446044242 2 IN IP4 0.0.0.0
s=-
t=0 0
a=group:BUNDLE 0
a=ice-lite
m=audio 1 RTP/SAVPF 111 126
c=IN IP4 0.0.0.0
a=mid:0
a=ice-ufrag:7dukk1h790kpqj
a=ice-pwd:v86n75u7fmbf6rt9ss2nllf50
a=fingerprint:sha-256 0B:BD:38:32:D5:18:C4:AD:2A:93:23:97:A5:59:AC:89:1D:6C:A0:E5:00:AC:12:3F:20:E7:6C:D4:B6:0C:EC:65
a=setup:passive
a=candidate:1 1 udp 2130706431 2001:67c:4e8:f102:4:0:285:4 32002 typ host generation 0
a=candidate:2 1 udp 2130706431 91.108.9.68 32002 typ host generation 0
a=rtpmap:111 opus/48000/2
a=rtpmap:126 telephone-event/8000
a=fmtp:111 minptime=10; useinbandfec=1; usedtx=1
a=rtcp:1 IN IP4 0.0.0.0
a=rtcp-mux
a=rtcp-fb:111 transport-cc
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=recvonly
a=ssrc-group:FID 3963038161
a=ssrc:3963038161 cname:stream3963038161
a=ssrc:3963038161 msid:stream3963038161 audio3963038161
a=ssrc:3963038161 mslabel:audio3963038161
a=ssrc:3963038161 label:audio3963038161

Laky-64 avatar Aug 07 '23 22:08 Laky-64

The mid is audio3963038161 in the offer while it is 0 in the answer. You should change the mid to 0 when creating the track.

paullouisageneau avatar Aug 08 '23 06:08 paullouisageneau

@paullouisageneau tried and still not working https://github.com/pytgcalls/ntgcalls/blob/master/src/webrtc/MediaStreamTrack.cpp#L13

Laky-64 avatar Aug 08 '23 11:08 Laky-64

Have you checked samples are read and sent properly? If so, could you please provide the verbose log?

paullouisageneau avatar Aug 08 '23 20:08 paullouisageneau

The only way to check is by the verbose log, here to you @paullouisageneau :

Verbose Log (Sorry but is too long)

https://nekobin.com/payazuneza.yml

Laky-64 avatar Aug 08 '23 20:08 Laky-64

@paullouisageneau Any news?

Laky-64 avatar Aug 14 '23 14:08 Laky-64

I don't see anything obviously wrong, but the lack of proper negotiation could be the issue. For instance libdatachannel does not support those extensions currently, while they are unilaterally added to the answer, if telegram assumes they must be enabled it's going to be a problem.

Is there any debug output on Telegram side?

paullouisageneau avatar Aug 17 '23 07:08 paullouisageneau

Here I am again months later, as you can see from the questions asked, I arrive dissatisfied with the consumption of webrtc and etc, and therefore I retrace my steps, but there I have no problems sending the audio, and the code is completely accessible and open source, in any case, I find myself again with the problem of the audio not working

But if it is useful I can give you the output of Google's webrtc how to make the connection (In this case everything works perfectly but with problems of excessive consumption on a large scale)

https://pastebin.com/6wmkZ5LH

Laky-64 avatar Apr 22 '24 22:04 Laky-64

I'd need more context (linked info and code are not accessible anymore). Is there any debug output on Telegram side to know why it drops incoming audio?

paullouisageneau avatar Apr 24 '24 15:04 paullouisageneau

I'd need more context (linked info and code are not accessible anymore). Is there any debug output on Telegram side to know why it drops incoming audio?

Nope, there is no possible log on the Telegram side unfortunately, only the one that is emitted by webrtc on my side, for the code, here is the new link: https://github.com/pytgcalls/ntgcalls/tree/ntgcalls-x/ntgcalls_x

Laky-64 avatar Apr 24 '24 21:04 Laky-64

I can't see any issue with the SDP negitiation, however there is no way Opus frames are correctly sent. Have you checked what is sent? The default frame size is 20ms, meaning there are 50 per second, while the file parser is set up to send 48000 because of a confusion between the audio sampling rate and the number of sample files (each containing a frame) per second.

paullouisageneau avatar Apr 25 '24 20:04 paullouisageneau

The fact is that I have currently copied and pasted the very example you provided here: https://github.com/paullouisageneau/libdatachannel/blob/master/examples/streamer/

Laky-64 avatar Apr 26 '24 14:04 Laky-64

Sure, but there are a couple changes, for instance the streamer example sets the file parser to 50 samples per second (files contain 20ms frames): https://github.com/paullouisageneau/libdatachannel/blob/fd6cf712207e0bcce5b966879695675a93107464/examples/streamer/opusfileparser.hpp#L16

while it is set to 48000 here, which won't work.

paullouisageneau avatar Apr 26 '24 14:04 paullouisageneau

Applied, by the way no fixes: https://github.com/pytgcalls/ntgcalls/commit/d35b4b2fba7d4c6f562be7c1ae7ac06a2ef09905

Laky-64 avatar Apr 28 '24 23:04 Laky-64

I've had a look into file_parser.cpp, there are a lot of changes compared to the original example. There is an obvious issue: startTime and currentTime are computed as milliseconds, while they are supposed to be microseconds. This most probably breaks everything as nothing will be sent as a result. I don't know if this is the only issue, you should really debug your code by checking that samples are correctly sent every 20ms.

Also, synchronization for the created thread is incorrect. The thread should lock the mutex at the beginning of its function, and you should not detach the thread as you must wait for it in the destructor. I don't think this breaks playback but it might lead to crashes.

paullouisageneau avatar Apr 30 '24 14:04 paullouisageneau