node-ytdl-core [Feature request] YouTube Live Stream Audio Only

Hello, as we all know, when requesting 'audio only' from node-ytdl, it will gives us a URL which is a chunk of 5 seconds audio playable directly from the browser. so the problem is we have to make a new request every < 5 seconds to get new audio. I tried to implement a mechanism where using Axios in a loop with a circular buffer, when getting a response, data will be pushed to the buffer and then write it to a WriteStream (a local file). The result was a bit unreliable, i was able to play it on VLC not on other players because data headers was also pushed with the chunks. when playing, i can hear duplicate samples ex. the first 15s are the same as the original first 5s request but duplicated, which made me think that YouTube will gives you the same audio if the request was not buffered or played completely or the request time was a bit early, like if we make 2 requests under 2 seconds we will get the same response. Anyway let's see what we can accomplish with that and with the help of anyone who is interested playing audio from a live stream because we all want to save some bandwidth and want to listen to music without watching while working. Here is some of my implementations:

const duration = 120;
const bufferSize = sampleRate * duration;
const buf = new CircularBuffer(bufferSize);`

async function callAxios(url) {
    try {
        let response = await axios( { method: 'get', url: url, responseType:'arraybuffer' });
        return response.data
    }catch (err) {
        console.log(err)
    }
}
 async function loopThrough(url) {
    let streaming = true;

    // Loop through listings until they don't exist
    do {
        const data = await callAXios(url);
        if (data) {

            buf.push(data);
            console.log(buf.size());
            // streaming = false;
            writeStream.write(buf.deq(), function () {

            })
        }else{
            streaming = false;
            writeStream.end();
        }
    } while (streaming)
}

mimeType: 'audio/mp4; codecs="mp4a.40.2"',
  qualityLabel: null,
  bitrate: 144000,
  audioBitrate: 128,
  itag: 140,
  url: 'https://r2---sn-vbxgv-cxtz.googlevideo.com/videoplayback?expire=1621879552&ei=oJarYL2YI_K1xN8P9aCe-A8&ip=MYIP&id=KvRVky0r7YM.3&itag=140&source=yt_live_broadcast&requiressl=yes&mh=mz&mm=44%2C29&mn=sn-vbxgv-cxtz%2Csn-25glene6&ms=lva%2Crdu&mv=m&mvi=2&pl=24&initcwndbps=222500&vprv=1&live=1&hang=1&noclen=1&mime=audio%2Fmp4&ns=Yl-cSuGcwe5MjklshWtfjmsF&gir=yes&mt=1621857627&fvip=4&keepalive=yes&fexp=24001373%2C24007246&beids=9466587&c=WEB&n=3-RD52cbC59tTArA&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Clive%2Chang%2Cnoclen%2Cmime%2Cns%2Cgir&sig=AOq0QJ8wRQIgbPiNUb8oGioz2IucWT_lVzPGUucvx49z-CgsReJUCXYCIQDCS3pueUwS2i8LMVeWhzQ-SWPlQO0u6odkPgSOAz-4pA%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIhAOhfxTRYA7MruZNdjdvfyjgnO-sm2VkQ69aOQkilx7zbAiBZG4LESU2pZScRCExSTXIXNZ8BcZs2bzElAvl835lwIw%3D%3D&ratebypass=yes',
  lastModified: '1621825709929201',
  quality: 'tiny',
  projectionType: 'RECTANGULAR',
  targetDurationSec: 5,
  maxDvrDurationSec: 43200,
  highReplication: true,
  audioQuality: 'AUDIO_QUALITY_MEDIUM',
  audioSampleRate: '48000',
  audioChannels: 2,
  hasVideo: false,
  hasAudio: true,
  container: 'mp4',
  codecs: 'mp4a.40.2',
  videoCodec: null,
  audioCodec: 'mp4a.40.2',
  isLive: true,
  isHLS: false,
  isDashMPD: false

May 24 '21 12:05 joek85

Added a delay for 5 seconds

async function delay(ms) {
    // return await for better async stack trace support in case of errors.
    return await new Promise(resolve => setTimeout(resolve, ms));
}

async function loopThrough(url) {
       ...
      await delay(5000)
}

Now I got a continuous audio, but there is some missing audio part which was skipped due to unsynchronized requests. So the basic theory almost works, but i think this is not the best way to approach it. Maybe node streams will do better? The audio coming from the response needs to be buffered, after that will do another request and so on. And also need to add a codec header in this case AAC was used.

May 25 '21 16:05 joek85

i'd expect some kinda header to track which range to get 🤔 not just request ever ~5 sec

May 26 '21 07:05 TimeForANinja

if you try 1 request and bump the result to a file and open it, you can see at the beginning some header information and then the audio data.

May 26 '21 07:05 joek85

requestOptions.headers = Object.assign({}, requestOptions.headers, {
        Range: `bytes=${0}-${''}`,
    });

const ondata = chunk => {
        downloaded += chunk.length;
        stream.emit('progress', chunk.length, downloaded, contentLength);
        console.log(chunk.length);
        console.log(downloaded);
        console.log(contentLength);
    };

results:

16384
16384
NaN
16384
32768
NaN
16384
49152
NaN
16384
65536
NaN
15926
81462
NaN
end

seems it works, now I'm trying to make a callback on end event and make another request.

May 26 '21 09:05 joek85

const stream = createStream();
    doStream(stream, url).on('end', ()=> {
        doStream(stream, url)
    })

function doStream(stream, url) {

    const requestOptions = Object.assign({}, [], {
        maxReconnects: 6,
        maxRetries: 3,
        backoff: { inc: 500, max: 10000 },
    });
    requestOptions.headers = Object.assign({}, requestOptions.headers, {
        Range: `bytes=${0}-${''}`,
    });
    let req = miniget(url, requestOptions);
    let contentLength, downloaded = 0;

    const ondata = (chunk) => {
        downloaded += chunk.length;
        stream.emit('progress', chunk.length, downloaded, contentLength);
        console.log(chunk.length);
        console.log(downloaded);

        writeStream.write(chunk)
    };
    // req.on('response', res => {
    //     if (stream.destroyed) { return; }
    //     contentLength = contentLength || parseInt(res.headers['content-length']);
    //     stream.emit('response', res);
    //     console.log(res)
    // });

    req.on('data', ondata);
    req.on('prefinish', (data) => {

    });
    req.on('end', () => {
        if (stream.destroyed) { return; }
        stream.emit('end', '');
        // console.log(req)
    });
    req.on('drain', (drain) => {
        // console.log(drain)
    });
    req.on('unpipe', (unpipe) => {
        // console.log(unpipe)
    });
    return stream
}

same result, repetitive audio.

May 26 '21 14:05 joek85

   ftypdash    iso6mp41  8moov   lmvhd    ������  ��                                                @                                 (mvex    trex                        �trak   \tkhd   ������                                                        @             8mdia    mdhd    ������  ��    U�     -hdlr        soun            SoundHandler    �minf   $dinf   dref          url       �stbl   [stsd          Kmp4a                     ��     'esds      @           �   stts           stsc           stco           stsz               smhd          {emsg    http://youtube.com/streaming/metadata/segment/102015             ���Sequence-Number: 7457566
Ingestion-Walltime-Us: 1622114400732901
Ingestion-Uncertainty-Us: 112
Capture-Walltime-Us: 1622114400529764
Stream-Duration-Us: 37287320969666
Max-Dvr-Duration-Us: 14400000000
Target-Duration-Us: 5000000
First-Frame-Time-Us: 1622114409631562
First-Frame-Uncertainty-Us: 112
Finalized-Sequence-Number: 7457565
Finalized-Media-End-Timestamp-Us: 37287320985666
Finalized-Media-Lmt-Us: 1622090390317185
Finalized-Media-Lmt-Us: 1622090390317186
Finalized-Media-Lmt-Us: 1622090390317187
Finalized-Media-Lmt-Us: 1622090390317188
Finalized-Media-Lmt-Us: 1622090390317189
Finalized-Media-Lmt-Us: 1622090390317190
Finalized-Media-Lmt-Us: 1622090390317191
Finalized-Media-Lmt-Us: 1622090390317192
Finalized-Media-Lmt-Us: 1622090390317193
Finalized-Xformat: 133
Finalized-Xformat: 134
Finalized-Xformat: 135
Finalized-Xformat: 136
Finalized-Xformat: 139
Finalized-Xformat: 140
Finalized-Xformat: 140:slate=user
Finalized-Xformat: 160
Finalized-Xformat: 247:slate=user
Current-Segment-Xformat: 140
Encoding-Alias: L1_BA

The response consists of 5 chunks, each one has a length about 16384 bytes (it differs), so the first chunk is the MP4 header info like the moov and stts. we need to parse those header info so that we can know the 'end' offset. Any suggestions?

May 27 '21 11:05 joek85

[
  'Last-Modified',
  'Fri, 28 May 2021 12:17:52 GMT',
  'Content-Type',
  'audio/mp4',
  'Date',
  'Fri, 28 May 2021 12:18:44 GMT',
  'Expires',
  'Fri, 01 Jan 1990 00:00:00 GMT',
  'Cache-Control',
  'no-cache, must-revalidate',
  'Accept-Ranges',
  'bytes',
  'Content-Length',
  '81064',
  'Connection',
  'keep-alive',
  'Alt-Svc',
  'h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"',
  'X-Walltime-Ms',
  '1622204324579',
  'X-Bandwidth-Est',
  '131830985',
  'X-Bandwidth-Est-Comp',
  '17349397',
  'X-Bandwidth-Est2',
  '17349397',
  'X-Bandwidth-App-Limited',
  'false',
  'X-Bandwidth-Est-App-Limited',
  'false',
  'X-Bandwidth-Est3',
  '14048912',
  'Pragma',
  'no-cache',
  'X-Sequence-Num',
  '7475549',
  'X-Segment-Lmt',
  '1622204272988277',
  'X-Head-Time-Sec',
  '37377234',
  'X-Head-Time-Millis',
  '37377234071',
  'X-Head-Seqnum',
  '7475549',
  'Vary',
  'Origin',
  'Cross-Origin-Resource-Policy',
  'cross-origin',
  'X-Content-Type-Options',
  'nosniff',
  'Server',
  'gvs 1.0'
]

I tried to increase 'X-Sequence-Num' by 1 on nextChunk requests, it gives me the same response but after like 7 requests i get new audio. maybe also needs 'X-Head-Time-Millis' to be increased.

May 28 '21 12:05 joek85

[ftyp] size=8+16
  major_brand = dash
  minor_version = 0
  compatible_brand = iso6
  compatible_brand = mp41
[moov] size=8+560
  [mvhd] size=12+96
    timescale = 48000
    duration = 0
    duration(ms) = 0
  [mvex] size=8+32
    [trex] size=12+20
      track id = 1
      default sample description index = 1
      default sample duration = 0
      default sample size = 0
      default sample flags = 0
  [trak] size=8+404
    [tkhd] size=12+80, flags=3
      enabled = 1
      id = 1
      duration = 0
      width = 0.000000
      height = 0.000000
    [mdia] size=8+304
      [mdhd] size=12+20
        timescale = 48000
        duration = 0
        duration(ms) = 0
        language = und
      [hdlr] size=12+33
        handler_type = soun
        handler_name = SoundHandler
      [minf] size=8+219
        [dinf] size=8+28
          [dref] size=12+16
            [url ] size=12+0, flags=1
              location = [local to file]
        [stbl] size=8+159
          [stsd] size=12+79
            entry_count = 1
            [mp4a] size=8+67
              data_reference_index = 1
              channel_count = 2
              sample_size = 16
              sample_rate = 48000
              [esds] size=12+27
                [ESDescriptor] size=2+25
                  es_id = 1
                  stream_priority = 0
                  [DecoderConfig] size=2+17
                    stream_type = 5
                    object_type = 64
                    up_stream = 0
                    buffer_size = 0
                    max_bitrate = 0
                    avg_bitrate = 0
                    DecoderSpecificInfo = 11 90 
                  [Descriptor:06] size=2+1
          [stts] size=12+4
            entry_count = 0
          [stsc] size=12+4
            entry_count = 0
          [stco] size=12+4
            entry_count = 0
          [stsz] size=12+8
            sample_size = 0
            sample_count = 0
        [smhd] size=12+4
          balance = 0
[emsg] size=8+1139
[sidx] size=12+32
  reference_ID = 1
  timescale = 48000
  earliest_presentation_time = 0
  first_offset = 0
[moof] size=8+1028
  [mfhd] size=12+4
    sequence number = 7333
  [traf] size=8+1004
    [tfhd] size=12+16, flags=2002a
      track ID = 1
      sample description index = 1
      default sample duration = 1024
      default sample flags = 0
    [tfdt] size=12+8, version=1
      base media decode time = 1794422115536
    [trun] size=12+944, flags=201
      sample count = 234
      data offset = 1044
[mdat] size=8+78296

parsing the MP4 headers, no byte range found.

May 29 '21 09:05 joek85

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jul 29 '21 03:07 stale[bot]

Hello again, so after a long digging into the problem, finally i made it. https://github.com/joek85/yt-live

Dec 22 '22 18:12 joek85

node-ytdl-core node-ytdl-core copied to clipboard

[Feature request] YouTube Live Stream Audio Only

node-ytdl-core
node-ytdl-core copied to clipboard