jumpcutter icon indicating copy to clipboard operation
jumpcutter copied to clipboard

How to clone a YouTube video element so it can play the same video?

Open WofWca opened this issue 4 years ago • 8 comments

As a solution to #1, we could create another video element that plays the same video, hide it, and shift its playback for just a moment before the original video's .currentTime. That way we can effectively read the audio data that is yet to be played by the "main" video.

The problem is – I don't know how to clone a YouTube video element so it can play the same track.

This wouldn't work:

v = document.querySelector('video');
v2 = v.cloneNode(true);

Because

GET blob:https://www.youtube.com/79c3944d-0c45-412a-b5b2-c3d166d5c962 net::ERR_FILE_NOT_FOUND

So, how do I do it?

WofWca avatar Feb 15 '20 10:02 WofWca

I've managed to intercept the underlying MediaSource with this:

// Execute while the video is playing.
let capturedArgs;
URL._jumpcutterUnchangedCreateObjectURL = URL.createObjectURL;
URL.createObjectURL = function(...args) {
    const toReturn = URL._jumpcutterUnchangedCreateObjectURL(...args);
    capturedArgs = args;
    console.log('gotcha!', args, toReturn);
    return toReturn;
}

v = document.querySelector('video');
// Currently, removing `src` from a YouTube video makes it create a new one.
v.src = '';
setTimeout(() => {
    v = document.querySelector('video');
    v.play();
    originalMediaSource = capturedArgs[0];
}, 500);

But further attempts of attaching it to a new video did not go well:

v2 = document.querySelector('video').cloneNode(true);
v2.src = URL.createObjectURL(originalMediaSource);
setTimeout(() => {
    v2.play();
}, 100);
Uncaught (in promise) DOMException: The element has no supported sources.

WofWca avatar Feb 15 '20 10:02 WofWca

One option would be, although probably not really the nicest one, to create a new iFrame with src=window.location.href. Then, do the analysis inside the iFrame.

zznidar avatar Nov 20 '21 14:11 zznidar

When I clone the element I get the same error, however on Firefox you get a more verbose error:

Security Error: Content at https://www.youtube.com/watch?v=videoid may not load data from blob:https://www.youtube.com/f65944cf-7692-48c0-adfa-bb19d9376eb8.

I'm not sure if the security error is real and related to CSP/CORS/etc or if it’s just a generic error that is shown.

Another thing on my mind is that YouTube could streaming the media in, via a JavaScript media stream object, directly into the player meaning the blob URL is just a placeholder. However I'm not an expert on media streams so more investigation will be necessary


As an alternate solution; YouTube serves audio and video separately you are able to isolate the audio stream and play that in another element.

I found another add-on that converts YouTube into an audio only mode. Audio Only YouTube GitHub

It looks at the outgoing requests to find one with an audio mime type. It then takes that URL and strips off the URL parameters that provide DASH playback, and you end up with a full audio stream (mp3, m4a, ogg, webm, etc)

I'm not entirely sure how caching/bandwidth increase/sync issues would affect this as I presume you would have to have another audio stream downloading at the same time, rather than using the current video.

Also other sites that use DASH will do this differently, so maybe not a catch-all solution.

mt025 avatar Nov 20 '21 22:11 mt025

@zznidar Damn, son. This is genius. Never thought of that. Although you're right, there are a couple of headaches this method gives:

  • How to find the "same" element in the iframe. You can't do it with currentSrc.
  • How to make sure the clone page is the same as the original and contains the "same" element. For example, this is a problem with infinite-scroll pages (like Instagram, Twitter).
  • Performance overhead.

Will need to think whether it makes sense to try to implement it now or to wait until we have a less workaround-y way.

WofWca avatar Nov 21 '21 20:11 WofWca

via a JavaScript media stream object

Not sure if you meant something more specific than MediaSource that I mentioned in my second comment.

so maybe not a catch-all solution

Yeah, it doesn't sound like it is. Although maybe there's some kind of a standard or a library that everyone uses for this.

For YouTube specifically I had an idea of recognizing the current video ID, then utilizing YouTube API to create another video with that ID. But the iframe idea sounds a lot better.

WofWca avatar Nov 21 '21 20:11 WofWca

@WofWca good points.

  • The same element could maybe be found by id/class of that element (or its parents), but only as long as they are always the same (i. e. they do not change on each load)
  • Probably impossible. For Twitter, we could set iFrame src to the direct link of the Tweet. But this is not a universal solution and may need to be updated frequently

zznidar avatar Nov 22 '21 14:11 zznidar

There may be a better option, however. If you take a look at this StackOverflow answer for drawing waveform: https://stackoverflow.com/a/67265439 The waveform is drawn very quickly (took less than 10 seconds for a 28-minute audio file on my computer).

I haven't really dived too deeply into it (it has been on my todo-list for almost half a year now), but at a glance it seems to me that the size of the red columns corresponds to the loudness of the audio at that time.

Using this approach, it may be possible to pre-analyse the whole audio track and therefore skip the silent parts completely (and, furthermore, add a margin-before without any distortions).

zznidar avatar Nov 22 '21 14:11 zznidar

the size of the red columns corresponds to the loudness

Yeah, but to be more precise it's just raw samples.

The problem with this approach, however, is that it uses decodeAudioData which, as I heard, requires the whole file to be downloaded, which is pretty bad for long online (as opposed to local) videos and streams - downloading would take long and decoding would also take long + would probably load the CPU to 100% for the duration. Also intuitively it should take a pretty huge amount of memory before it's done processing.

And it also demands that we know the media source and can download it independently of the main <video> element, which is close to (a little more than, even) what the cloning algorithm requires.

WofWca avatar Nov 22 '21 15:11 WofWca

I've managed to intercept the underlying MediaSource with this: [...] But further attempts of attaching it to a new video did not go well: [...]

MediaSources are not generally re-usable unless they were implemented with that use case in mind, which understandably doesn't appear to be the case with YouTube's implementation. What we can do however is to monkey-patch this functionality into the MediaSource we intercept. All that's needed is to intercept the methods which YouTube calls, and to selectively forward some of them to a second MediaSource which we then can freely attach to our own audio element and use to seek around all the audio which YouTube has already buffered. And there's nothing really specific to YouTube with this scheme; provided enough methods are monkey-patched, it should work with any MediaSource-based video/audio player.

For a POC, just addBufferSource and appendBuffer was enough to make it work on YouTube (for a proper solution one would of course also have to deal with cleanup of buffers though):

const createObjectURL = URL.createObjectURL;
URL.createObjectURL = function(...args) {
  console.log("createObjectURL", args[0])
  const realMediaSource = args[0];
  const addSourceBuffer = realMediaSource.addSourceBuffer;
  realMediaSource.addSourceBuffer = function(...args) {
    console.log("addSourceBuffer", ...args)
    const realSourceBuffer = addSourceBuffer.call(this, ...args)
    if (!args[0].startsWith("audio")) {
      return realSourceBuffer
    }

    const copyMediaSource = new MediaSource()

    let copyBuffer
    copyMediaSource.addEventListener("sourceopen", () => {
      copyBuffer = copyMediaSource.addSourceBuffer(args[0])
    })

    const copyAudio = document.createElement("audio")
    copyAudio.src = createObjectURL(copyMediaSource)
    copyAudio.play()
    console.log(copyAudio)

    const orgAppendBuffer = realSourceBuffer.appendBuffer
    realSourceBuffer.appendBuffer = function(...args) {
      console.log("appendBuffer", this, ...args)
      if (copyBuffer) {
        copyBuffer.appendBuffer(...args)
      }
      return orgAppendBuffer.call(this, ...args)
    }

    return realSourceBuffer
  }
  return createObjectURL(...args);
}

Johni0702 avatar May 12 '23 22:05 Johni0702

We've been waiting for your arrival. Man oh man. So all we had to do was intercept one layer deeper?

Can this work when the video is already playing though?

I'll need to research the MediaSource thing.

Thanks a lot for descending to us!

WofWca avatar May 13 '23 14:05 WofWca

I did some reading, and looks like YouTube is using the DASH technique (but I'm not sure if it's 100% compatible). And the issue applies to all the DASH players as well, e.g. every website that uses the Shaka Player. It may be easier to test/develop this feature on the https://shaka-player-demo.appspot.com/demo/ page.

WofWca avatar May 16 '23 14:05 WofWca

I did some more research in regards to https://github.com/WofWca/jumpcutter/issues/2#issuecomment-1546408197, here are some things I found out. I short, my biggest concern is whether we can initialize our extension after the video has already started playing.

  • Looks like it's impossible to directly reuse the original MediaSource.sourceBuffers in the new MediaSource. There's just no such API. There is addSourceBuffer, but it only creates a new SourceBuffer.
  • So I guess we need to watch every call to appendBuffer of every SourceBuffer.
  • But I'm not sure if it's always possible to skip some initial calls to appendBuffer (i.e. not replicate every call to appendBuffer() to the clone MediaSource, and only start doing it after some time (e.g. after the extension has been loaded).
  • I'm also not sure if we can go without intercepting some initial originalMediaSource.addSourceBuffer() calls, i.e. if we can look at an already initialized MediaSource and based on its sourceBuffers determine with what arguments we need to call cloneMediaSource.addSourceBuffer().
  • Is there a way to get the MediaSource given just the HTMLMediaElement that is playing it. If it's using srcObject then sure, but what if it's v.src = URL.createObjectURL(internalMediaSource)?

WofWca avatar May 17 '23 13:05 WofWca

Aight my dudes, after 4 years in development, hopefully it would have been worth the wait. The prototype is working on YouTube!

https://github.com/WofWca/jumpcutter/assets/39462442/3070927e-52a7-4468-b5a2-ca349a806f9c

Here's the build that you can play around with (you'll have to "Load unpacked"): dist-chromium.zip. Built from 24157832b3d2c6c74f7d9399ad8adb814471393f.

WofWca avatar May 28 '23 15:05 WofWca

unless they were implemented with that use case in mind

@Johni0702 could you please clarify how this can be done?

WofWca avatar Jun 01 '23 07:06 WofWca

unless they were implemented with that use case in mind

@Johni0702 could you please clarify how this can be done?

That's not something you can do. That's something the website author would have to have done, since they are the ones who implemented the original MediaSource (and chances are they haven't done that because it's unlikely they need it and most naive implementations I can think of would probably not support it "accidentally"). The only way you can do that is by monkey patching that functionality into it.

I don't know if there's any way to do this post-hoc (and I doubt it) but that shouldn't really be much of an issue because I'd imagine there's a way for addons to run before the page code does (meaning you'd just have to duplicate all sources just in case; assuming the browser doesn't actually do any decoding until you try to play the source, that should hopefully not be an issue; if it turns out it does, then you'd just have to store all the data you need to do it later). So it'd only really ever be an issue when the addon is freshly installed (people would have to refresh the page to properly use it).

Johni0702 avatar Jun 01 '23 09:06 Johni0702

That's something the website author would have to have done

I mean, if they were to have done it, how would it work?

WofWca avatar Jun 01 '23 12:06 WofWca

That's something the website author would have to have done

I mean, if they were to have done it, how would it work?

It would "just work". You could just take the same MediaSource or even URL and use it for multiple video/audio elements.

Johni0702 avatar Jun 01 '23 12:06 Johni0702

I mean, how would the website maker implement it?

Sorry XD

WofWca avatar Jun 01 '23 12:06 WofWca

Thanks again for the input! It's extremely satisfying to implement it after all this time!

And let's not stop here. If someone has another approach in mind, please share.

WofWca avatar Jul 01 '23 14:07 WofWca