python-scraperlib icon indicating copy to clipboard operation
python-scraperlib copied to clipboard

yt-dlp has new requirements for YouTube downloads

Open benoit74 opened this issue 1 month ago • 10 comments

See https://github.com/yt-dlp/yt-dlp/issues/14404

benoit74 avatar Nov 03 '25 19:11 benoit74

Nice.... :(

kelson42 avatar Nov 04 '25 09:11 kelson42

Ah it's the same one @Popolechien mentioned 6w ago ; I was afraid there was yet again new reqs.

rgaudin avatar Nov 04 '25 09:11 rgaudin

There are new reqs, but "luckily" they are well packaged and supposed to be very small, so we should have nothing to do besides installing proper extra.

benoit74 avatar Nov 04 '25 21:11 benoit74

so we should have nothing to do besides installing proper extra

This link is a bit more clear/up-to-date: https://github.com/yt-dlp/yt-dlp/wiki/EJS

That is, besides yt-dlp[default] (which pulls in yt-dlp-ejs), you might also need to install a javascript runtime if you don't have one already. Deno works out of the box but the others require passing specific configuration to enable

chapmanjacobd avatar Nov 04 '25 21:11 chapmanjacobd

Thank you @chapmanjacobd , looks like we indeed miss the javascript runtime in linked PR.

Do you know where one can find documentation about how to test if yt-dlp has all required stuff to use the new challenge solver? Is there a reliable way to check this?

benoit74 avatar Nov 05 '25 08:11 benoit74

If you run something like this yt-dlp -vvF https://m.youtube.com/watch?v=uFI5WpK2sgg

you should see these lines:

...
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp-nightly-builds [ffb7b7f44] (pip)
...
[debug] Optional libraries: ..., yt_dlp_ejs-0.3.0
...
[debug] JS runtimes: deno-2.5.6
...
[debug] [youtube] [jsc] JS Challenge Providers: bun (unavailable), deno, node (unavailable), quickjs (unavailable)
...
[youtube] [jsc:deno] Solving JS challenges using deno
[debug] [youtube] [jsc:deno] Using challenge solver lib script v0.3.0 (source: python package, variant: minified)
[debug] [youtube] [jsc:deno] Using challenge solver core script v0.3.0 (source: python package, variant: minified)
[debug] [youtube] [jsc:deno] Running deno: deno run --ext=js --no-code-cache --no-prompt --no-remote --no-lock --node-modules-dir=none --no-config --no-npm --cached-only -
...

deno just needs to be in the PATH somewhere, similar to the ffmpeg dependency

chapmanjacobd avatar Nov 05 '25 15:11 chapmanjacobd

I've tested and I can confirm that installation both yt-dlp[default] and deno>=2.0,<3.0 allows to download youtube videos correctly again, except that deno python dependency is not recommend + does not chmod +x deno dependency...

I now consider we should not make this move to add these in python-scraperlib, at least not for now.

deno is a 108M dependency as of 2.5.6. And default extra of yt-dlp probably adds some size as well.

This is just too much size for most scrapers which don't care about yt-dlp, and not about youtube. And since there is no need to add any code in the scraperlib, there is not much incentive to do so.

I recommend we wait for https://github.com/openzim/python-scraperlib/issues/244 to do that in a better way. Until then, scrapers interested by youtube downloads (only youtube, ted and openedx - sic - scrapers) should install yt-dlp[default] and deno>=2.0,<3.0 on their own. I would just add a warning in the scraperlib README for now.

@rgaudin WDYT?

benoit74 avatar Nov 14 '25 09:11 benoit74

OK, maybe a runtime warning when using video module?

rgaudin avatar Nov 14 '25 09:11 rgaudin

youtube-dlp already issues a warning when it downloads from youtube but misses the Javascript runtime ... this is far superior (because way more precise) than anything we could do in the scraperlib.

See https://farm.openzim.org/pipeline/08152988-b346-4219-8c04-8c44a6fb8574/debug logs ... which are quite explicit about the fact that we need to release youtube scraper again 🤣

benoit74 avatar Nov 14 '25 09:11 benoit74

Perfect

rgaudin avatar Nov 14 '25 10:11 rgaudin