yt-dlp has new requirements for YouTube downloads
See https://github.com/yt-dlp/yt-dlp/issues/14404
Nice.... :(
Ah it's the same one @Popolechien mentioned 6w ago ; I was afraid there was yet again new reqs.
There are new reqs, but "luckily" they are well packaged and supposed to be very small, so we should have nothing to do besides installing proper extra.
so we should have nothing to do besides installing proper extra
This link is a bit more clear/up-to-date: https://github.com/yt-dlp/yt-dlp/wiki/EJS
That is, besides yt-dlp[default] (which pulls in yt-dlp-ejs), you might also need to install a javascript runtime if you don't have one already. Deno works out of the box but the others require passing specific configuration to enable
Thank you @chapmanjacobd , looks like we indeed miss the javascript runtime in linked PR.
Do you know where one can find documentation about how to test if yt-dlp has all required stuff to use the new challenge solver? Is there a reliable way to check this?
If you run something like this yt-dlp -vvF https://m.youtube.com/watch?v=uFI5WpK2sgg
you should see these lines:
...
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp-nightly-builds [ffb7b7f44] (pip)
...
[debug] Optional libraries: ..., yt_dlp_ejs-0.3.0
...
[debug] JS runtimes: deno-2.5.6
...
[debug] [youtube] [jsc] JS Challenge Providers: bun (unavailable), deno, node (unavailable), quickjs (unavailable)
...
[youtube] [jsc:deno] Solving JS challenges using deno
[debug] [youtube] [jsc:deno] Using challenge solver lib script v0.3.0 (source: python package, variant: minified)
[debug] [youtube] [jsc:deno] Using challenge solver core script v0.3.0 (source: python package, variant: minified)
[debug] [youtube] [jsc:deno] Running deno: deno run --ext=js --no-code-cache --no-prompt --no-remote --no-lock --node-modules-dir=none --no-config --no-npm --cached-only -
...
deno just needs to be in the PATH somewhere, similar to the ffmpeg dependency
I've tested and I can confirm that installation both yt-dlp[default] and deno>=2.0,<3.0 allows to download youtube videos correctly again, except that deno python dependency is not recommend + does not chmod +x deno dependency...
I now consider we should not make this move to add these in python-scraperlib, at least not for now.
deno is a 108M dependency as of 2.5.6. And default extra of yt-dlp probably adds some size as well.
This is just too much size for most scrapers which don't care about yt-dlp, and not about youtube. And since there is no need to add any code in the scraperlib, there is not much incentive to do so.
I recommend we wait for https://github.com/openzim/python-scraperlib/issues/244 to do that in a better way. Until then, scrapers interested by youtube downloads (only youtube, ted and openedx - sic - scrapers) should install yt-dlp[default] and deno>=2.0,<3.0 on their own. I would just add a warning in the scraperlib README for now.
@rgaudin WDYT?
OK, maybe a runtime warning when using video module?
youtube-dlp already issues a warning when it downloads from youtube but misses the Javascript runtime ... this is far superior (because way more precise) than anything we could do in the scraperlib.
See https://farm.openzim.org/pipeline/08152988-b346-4219-8c04-8c44a6fb8574/debug logs ... which are quite explicit about the fact that we need to release youtube scraper again 🤣
Perfect