node-ytdl-core icon indicating copy to clipboard operation
node-ytdl-core copied to clipboard

Is it true we have to combine the video and audio files using ffmpeg or the python or JS ffmpeg port?

Open nonopolarity opened this issue 1 year ago • 15 comments

Often we have to download the 1080 video and audio file separately as 2 files. Is it true we just have to use ffmpeg or the Python or JS port of ffmpeg to combine the 2 files into one .mp4 file? ytdl-core probably doesn't have this feature?

(example: https://zulko.github.io/moviepy/ https://github.com/ffmpegwasm/ffmpeg.wasm )

nonopolarity avatar May 01 '23 05:05 nonopolarity

Youtube separates the audio and video streams for higher resolution videos.

You will have to use ffmpeg to combine these streams, thankfully this repo has an example https://github.com/fent/node-ytdl-core/blob/master/example/ffmpeg.js

richardabear avatar May 01 '23 10:05 richardabear

does ffmpeg combine the video and audio like in a few seconds? I could also use Final Cut to combine them as it basically is a reencode and it takes a long time. VLC Player can also combine the video and audio and it takes only 1 or 2 seconds or just a few seconds even if the video length is an hour

nonopolarity avatar May 03 '23 05:05 nonopolarity

I think that will depend on your hardware and usecase.

I have found that the performance of ffmpeg is quite impressive when it comes to mixing just the 2 streams. Also it wouldnt be a "re encoding" technically, the way the documentation describes it. because you are just copying the stream from the video "-c:v copy" flag

richardabear avatar May 03 '23 08:05 richardabear

I am more concerned about, doing it this way using ffmpeg,

  1. does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
  2. does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

nonopolarity avatar May 03 '23 10:05 nonopolarity

You only have to combine the video and audio files if you download video-only and audio-only streams.

If you don't care about downloading the absolute highest quality you can just download the highest quality stream that already contains audio and video with something like this:

const info = await ytdl.getInfo(url, {});
const format = ytdl.chooseFormat(formats, {
  filter: "audioandvideo",
  quality: "highest",
});
ytdl.downloadFromInfo(info, {
  quality: format.itag
})

christiangenco avatar May 04 '23 01:05 christiangenco

You only have to combine the video and audio files if you download video-only and audio-only streams.

If you don't care about downloading the absolute highest quality you can just download the highest quality stream that already contains audio and video with something like this

right. in the past it often means 360p, which is vastly different from 720 or 1080p

nonopolarity avatar May 04 '23 07:05 nonopolarity

I am more concerned about, doing it this way using ffmpeg,

  1. does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
  2. does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

That completely depends on your use case.

In my use case I just use the second option (copy) i dont reencode.

richardabear avatar May 04 '23 07:05 richardabear

I am more concerned about, doing it this way using ffmpeg,

  1. does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
  2. does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

the question is not about which one is it. The question is about how does ffmpeg do it and naturally, if a job can be done in 2 seconds, I don't want to spend 2 to 5 minutes to do it.

nonopolarity avatar May 04 '23 07:05 nonopolarity

pass the -c copy flag to the ffmpeg command and it wont reencode

richardabear avatar May 04 '23 07:05 richardabear

-c:v copy and -c:a copy will only work if you're merging two compatible streams (or if you're merging them into an mkv wrapper that basically supports streams of any type).

If your video is encoded with h264 (.mp4) your audio needs to be encoded with aac to copy both streams into a new .mp4 without re-encoding.

If your video is encoded with vp8 or vp9 (.webm) your audio needs to be encoded with either opus or vorbis to copy both streams into a new .webm without re-encoding.

The technique the example ffmpeg.js script uses to merge audio and video is to always copy the audio codec and always re-encode the audio (it includes -c:v copy but doesn't specify the audio encoding which means ffmpeg will always re-encode the audio to a compatible format).

This isn't a terrible strategy because:

  1. It will produce a playable video every time.
  2. Re-encoding audio takes an order of magnitude less time than re-encoding video.
  3. It's simple. You don't need a first pass of ffprobe to check that the streams are compatible.

You could make sure you never re-encode by selecting compatible video and audio streams at download time.

christiangenco avatar May 04 '23 15:05 christiangenco

To add more to @christiangenco 's answer in my experience or at least the way I understand it is, that youtube will take your input video (the video file you upload) and re-encode it in those exact formats (h264/h265) for videos and then aac for audio, therefore when using the ffmpeg method, you are able to just use copy encoding all the time (atleast in my experience)

richardabear avatar May 05 '23 01:05 richardabear

Yup 👆

The trouble is that YouTube also re-encodes your video into webm and opus so often when I ask node-ytdl-core for bestaudio and bestvideo it gives me two incompatible formats.

christiangenco avatar May 05 '23 15:05 christiangenco

I recommend avoid using opus for audio and use the mp4a.40.2 if you are planning to use the .mp4 format

mp4 players usually does not support 48khz which opus uses.

Kinuseka avatar Jul 24 '23 13:07 Kinuseka

How can I merge video and audio to output an mp4?

My code:

        const audioStream = ytdl(URL as string, {
          filter: 'audioonly',
          quality: 'highestaudio',
        });

        const videoStream = ytdl(URL as string, {
          filter: (format) => format.hasVideo && (format.container === 'mp4' || format.container === 'webm'),
          quality: qualityOption,
        });

luciano-repetti avatar Jan 03 '24 05:01 luciano-repetti

I am more concerned about, doing it this way using ffmpeg,

  1. does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
  2. does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

the question is not about which one is it. The question is about how does ffmpeg do it and naturally, if a job can be done in 2 seconds, I don't want to spend 2 to 5 minutes to do it.

@nonopolarity Did you manage to resolve this issue? I have the same problem I need a quick response to merging the files (in less than 5 seconds)

I tried to use the files separately with javascript in the browser, but Safari limits the amount of media that is running and ended up breaking my application

micaelsgarcez avatar Jun 25 '24 21:06 micaelsgarcez