[Youtube] Got server error HTTP Error 403: Forbidden(latest master version)
Checklist
- [x] I'm reporting a broken site support issue
- [x] I've verified that I'm running youtube-dl version 2024.08.07(the latest master version)
- [x] I've checked that all provided URLs are alive and playable in a browser
- [x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [x] I've searched the bugtracker for similar bug reports including closed ones
- [x] I've read bugs section in FAQ
Verbose log
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2024.08.07.local
[debug] Python 3.8.5 (CPython x86_64 64bit) - macOS-10.16-x86_64-i386-64bit - OpenSSL 1.1.1h 22 Sep 2020
[debug] exe versions: ffmpeg 7.0.1-tessus, ffprobe 6.1.1, rtmpdump 2.4
[debug] Proxy map: {'http': 'http://127.0.0.1:58309', 'https': 'http://127.0.0.1:58309'}
Usage: youtube-dl [OPTIONS] URL [URL...]
youtube-dl: error: You must provide at least one URL.
Type youtube-dl --help to see a list of all options.
PASTE VERBOSE LOG HERE
Description
- The download will run normally for a short time, and then a 403 error will appear
youtube-dl 'https://www.youtube.com/watch?v=lLSkbZ3-EOs'
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 41 (attempt 2 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 41 (attempt 3 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 41 (attempt 4 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 41 (attempt 5 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 41 (attempt 6 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 41 (attempt 7 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 41 (attempt 8 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 41 (attempt 9 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 41 (attempt 10 of 10)...
[download] Skipping fragment 41...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 42 (attempt 1 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 42 (attempt 2 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 42 (attempt 3 of 10)...
[download] Got server HTTP error: HTTP Error 403: Forbidden. Retrying fragment 42 (attempt 4 of 10)...
Indeed this seems to be a pathological video where almost all video formats fail on the first fragment and 299 may fail later, regardless of Python 2.7/3.5/3.9 and User-Agent settings.
yt-dlp 2024.08.06 still works, apparently. It has fancy networking that we can't easily replicate: maybe punt to curl for all requests?
I have that with every single video I try. Curiously enough format '18' work all the time. Other formats that work are '136', '137', '248' and/or '160', but it depends on video - not always the case. Still, format '18' is the most reliable to work.
So
-
poToken(I agree, this is always being detected today) randomly breaks the download rather than uniformly giving 403 as with the revised n-sig "throttling"? - different clients get more or less functional links for formats with the same
itag?
Can confirm that a lot of the video-only formats are just being 403-ed in the middle with their downloads, resulting in me getting files that stop after about 10-20 minutes into the video, but still have full sized audio.
By now I have written something into my scripts to just pick format 18 as long as a flag is set, because i foresee this issue happening again in the future once it is eventually fixed... >.>
So has anyone tried fetching fragments in fragments of <1MB? We already had a work-around to download in fragments to avoid throttling IIRC.
Otherwise:
- actually ignore the data with
poTokenand use the existing "punt to API" logic with a selected unafflicted client. - extend to user-specified other, or selection of, clients similar to the yt-dlp extractor logic.
Apparently the latest fix worked for not even a day, that doesn't bode well. Personally I keep getting "giving up after 0 fragment retries" in my python stuff.
From what I read in yesterdays thread, it seems like this will just not work out with fake JS interpretation if they try to combat this in the slightest. Like, that almost doesn't deserve the name attack vector, that's an attack landscape.
This change is significant. I checked old, pre quantum Firefox and videos don't work any more, when 3 days ago they did.
Maybe the new player JS uses some G JS syntax extension (aka ECMA2021+) that hadn't been contemplated in those FF versions. Is there an error in the JS console?
It used to work as embedded or as mobile (when used mobile user agent). Now all of them display all saying error:
An error occurred. Please try again later. (Playback ID: j-bZsC_YehYVyZZ8)
Learn More (https://support.google.com/youtube/?p=player_error1&hl=en)
Loading any video at https://www.youtube.com/embed/1234567890a:
Content Security Policy: Couldn't process unknown directive 'require-trusted-types-for' <unknown>
mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create www-embed-player.js:26:77
InvalidStateError www-embed-player.js:1128:42
Then pressing play:
Error: WebGL: Error during native OpenGL init. base.js:11283:169
Error: WebGL: WebGL creation failed. base.js:11283:169
ED.
If it's of any help, despite what was said before, there are some videos that work.
First - this one doesn't, and gives following console log
Content Security Policy: Couldn't process unknown directive 'require-trusted-types-for' <unknown>
mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create www-embed-player.js:26:77
InvalidStateError www-embed-player.js:1128:42
Empty string passed to getElementById(). zVhcVoOEv7o line 2 > eval:795:28944
Error: WebGL: getParameter: parameter: invalid enum value <enum 0x9246> base.js:11283:254
This one does and, with this log:
Content Security Policy: Couldn't process unknown directive 'require-trusted-types-for' <unknown>
mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create www-embed-player.js:26:77
InvalidStateError www-embed-player.js:1128:42
Empty string passed to getElementById(). KAR4fAX5T7Y line 2 > eval:4650:33869
So, after clicking 'play' it gives this error: Error: WebGL: getParameter: parameter: invalid enum value <enum 0x9246> base.js:11283:254.
No matter what chunk size I use I'm seeing hard 403 errors at 1Meg as others have reported - I'm able to download as many fragments as I want up to 1Meg and then get a 403.
Have experimented with generating cpns (nonces) and adding them to the format fragment URLs without any luck as well as using rn (request numbers) in the URL query instead of byte ranges. Have tried sleeping between fragments to mimick video playback also without joy.
It feels like they've added a check somewhere which fails at the 1Meg mark but I haven't found anything yet where that might be.
Checking via the browser I can see that youtube is happily downloading /videoplayback fragment URLs above 1Meg without any issue...
But in the browser the media links have the pot parameter with its poToken challenge result, no? Which is what we can't haz.
In line with step 1 above, I'm gradually pulling stuff from the yt-dlp extractor, enough to download HLS with client ios, but plainly not yet enough to get unblocked links from tv or web_creator, eg with format 135. Should I be expecting that?
But in the browser the media links have the
potparameter with itspoTokenchallenge result, no? Which is what we can't haz.In line with step 1 above, I'm gradually pulling stuff from the yt-dlp extractor, enough to download HLS with client
ios, but plainly not yet enough to get unblocked links fromtvorweb_creator, eg with format 135. Should I be expecting that?
I'm not seeing a pot parameter in the query strings, I am seeing post data in the /videoplayback requests which is referred to in the source as playbackCookie
Edit: Looks like the playbackCookie / POST data is extracted from the bytes of the previous fragment response somehow
This is the procedure that I am using in my own code.
Load https://www.youtube.com/embed/<id#> and find the base.js link. Do the usual to extract the sig and n-sig. Extract the signatureTimestamp for the next step.
Load https://www.youtube.com/youtubei/v1/player with the signatureTimestamp and TVHTML5_SIMPLY_EMBEDDED_PLAYER as the client name.
If the JSON response contains "formats" and/or "adaptiveFormats" then we're good. This covers most videos, including age-gated ones. The 403 problem occurs when we have go to the next step. We can't use "www.youtube.com". We must use "m.youtube.com" with the user agent set to something like "Mozilla/5.0 (Android 14)" which is what I'm using.
Load https://m.youtube.com/watch?v=<id#> and extract the JSON structures that you would otherwise have gotten from the previous step.
And that's it. The extra step is only required for videos that disallow embedding.
Please don't bother to supply any "me too" reports unless the log shows some novelty that may help with rectification. Just "Like", or whatever, an existing similar report.
You can see how a poToken is being sent in POST data by the browser in the Invidious code that shows how to capture the value. But I understood from yt-dlp discussions that a pot query parameter was used in the media links associated with the pot-ified session.
@8ChanAnon's algorithm is what is currently done for age-gate videos, up to the last step with m.youtube.com which is new and interesting. What happens if you skip straight to that step?
Step 2 will only work if TVHTML5_SIMPLY_EMBEDDED_PLAYER is not pot-ified, and that seems to be in question.
Indeed, Android 14/FF 122 at m.youtube.com didn't list the poToken experiment IDs although yt-dlp has reported unsuccess with Android clients.
I don't think it's worth trying to get around the poToken, it will eventually be required in all clients.
I keep digging into base.js when I get some time trying to understand how the token is created, it does seem to be extracted from the bytes of at least the first video fragment as far as I can tell, but not all fragments?...
There's a Uint8Array which appears to be the fragment response data?... manipulated several times and then 82/84/68 bytes of that array are stored as playbackCookie which is then sent in the POST data
At least it would be good to have a program that is not not-youtube-dl while a long term solution to the twattery is being investigated.
@dirkf
yt-dlp 2024.08.06 still works, apparently. It has fancy networking that we can't easily replicate: maybe punt to curl for all requests?
On a lot of websites I want to download from youtube-dl and curl dont even get the correct html, instead of the one i would get in my Browser , they receives a version that has the captcha.
Instead I have a bash-script that predownloads the non-captcha html via "https://github.com/lwthiker/curl-impersonate" (runs in docker container , i use tag: 0.5.2-ff-alpine)
AFAIK lwthiker/curl-impersonate is the only http client that completly impersonates an actual browser like firefox. A lot of the problems I had with 403 errors in youtube-dl where captchas triggered by http-client not being exactly like an official version of firefox or chrome :) (this might even be valid for fragment-downloads)
Yes, but so far as captcha is generally understood (G/recaptcha, hcaptcha, Cloudflare challenge aka breaks the Web), that is not the problem. Even if it solved the poToken issue, a dependency running under Docker would not be an acceptable solution for the main functionality of the program, though it might be a PoC for a solution.
Yeah not relevant here but I did bookmark it for other things, looks like a decent tool :)
I only said that i run it in docker. Apparently it can be used as a library, see https://github.com/lwthiker/curl-impersonate?tab=readme-ov-file#Advanced-usage . Though I have not looked into that as I run youtube-dl and almost any other apps via docker anyway :)
Fair enough, but even curl-impersonate is quite a beefy dependency that would not be supportable on the same range of targets as the current yt-dl.
Fair enough.
What about this? : add an option to use it as an external downloader (--external-downloader) (Currently supports aria2c,avconv,axel,c url,ffmpeg,httpie,wget) and find some good presets for it (e.g. which specific browser it should impersonate)
@PatrickJRed it still wouldn't help with this situation, but it's pretty simple to add another curl downloader, you should open another issue or make the PR yourself
Apparently the API fallback in the YT extractor wouldn't have worked for ages, if at all, because (unlike in the age-gate fallback) no sts was being sent. Then:
$ python -m youtube_dl -v -f 135 'lLSkbZ3-EOs'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-f', u'135', u'lLSkbZ3-EOs']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: c5098961b
[debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1w 11 Sep 2023 - glibc 2.15
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[youtube] lLSkbZ3-EOs: Downloading webpage
WARNING: [youtube] Ignoring initial response with broken formats (poToken experiment detected)
[youtube] lLSkbZ3-EOs: Downloading player 28fd7348
[youtube] lLSkbZ3-EOs: Downloading API (WEB_CREATOR-2.20240726.00.00) JSON
[debug] [youtube] Decrypted nsig WNiuqfCxMStm3Y-S5 => zjd8WoLzKO-kpg
[debug] [youtube] Decrypted nsig W0_Kqkc3K5-gAlx82 => _9t8_-AhZvi04A
[debug] Invoking downloader on u'https://rr5---sn-cu-aigss.googlevideo.com/videoplayback?sparams=expire%2Cei%2Cip%2Cid%2Caitags%2Csource%2Crequiressl%2Cxpc%2Cbui%2Cspc%2Cvprv%2Csvpuc%2Cmime%2Cns%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&ei=SpK4Zs79DaqMp-oP2rj0-Qs&ip=46.208.6.25&clen=13106623&spc=Mv1m9rGN544RnSCiFwx6ZtiSyBYnBV85XTHGPMke3rPeUihI_jsi&id=o-AAzf1sFMtrrt-18dNqW2pVet7n-fVaH_H64E4djwP--H&txp=5535434&svpuc=1&aitags=133%2C134%2C135%2C136%2C160%2C242%2C243%2C244%2C247%2C278%2C298%2C299%2C302%2C303%2C394%2C395%2C396%2C397%2C398%2C399&gir=yes&xpc=EgVo2aDSNQ%3D%3D&requiressl=yes&keepalive=yes&source=youtube&mv=m&sig=AJfQdSswRQIhALKafKC8aHa08g0RPY6BpWA3m1oYlGDfGaHRvQhr_5UuAiAN533PHmqD5BpJaEfqhH4DlmLUg-b1zL4-Sin1oCuEnw%3D%3D&pcm2cms=yes&dur=850.966&ns=qhp8TEY1pHqt6L7-RadKU7MQ&initcwndbps=1367500&vprv=1&lsig=AGtxev0wRQIhAJ-ZhbXFfnM7SBC5y4SDngAzL5uMdn_hcmeqhSu3fInIAiAHnzyPjeapdCqUpj7Nr2l3GwYhTBrA4q4N85HNsl1i4g%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpcm2cms%2Cpl%2Cinitcwndbps&lmt=1723149328860953&c=WEB_CREATOR&sefc=1&bui=AQmm2eywWXlCZZ-0GdPj3i0R2tAb5WWt4NMpDpw7oUPM3vAdsNyDt7MuInx45YsZj7Ekmcd6Cy-NCGb1&mime=video%2Fmp4&fvip=4&rqh=1&itag=135&mm=31%2C29&mn=sn-cu-aigss%2Csn-cu-c9id&mh=jQ&n=_9t8_-AhZvi04A&mt=1723371671&expire=1723393706&pl=25&ms=au%2Crdu&mvi=5'
[dashsegments] Total fragments: 2
[download] Destination: 全球金融大动荡,日本加息背刺美国,中国躺赢?【汤山老王】-lLSkbZ3-EOs.mp4
[download] 100% of 12.50MiB in 00:10
$
I'm also experiencing this error, but when i changed the user-agent to:
"Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36" (mobile user agent of the latest chrome version) it worked. Here is the exact line:
youtube-dl --verbose -x --user-agent "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36"
I'm also experiencing this error, but when i changed the user-agent to: "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36" (mobile user agent of the latest chrome version) it worked. Here is the exact line:
youtube-dl --verbose -x --user-agent "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36"
can confirm this works for me (youtube-dl proxy through tor-browser-bundle exit=de)
Yes, because specifying a mobile UA redirects to m.youtube.com as above. Mozilla/5.0 (Android 14; Mobile; rv:115.0) Gecko/115.0 Firefox/115.0 is a shorter UA that has the same effect.
However the ytInitialData in the mobile page is stringified JSON rather than actual JSON as in the desktop page, and this causes an additional API call (I suspect that no valid data is returned).
Yes, because specifying a mobile UA redirects to m.youtube.com as above.
Mozilla/5.0 (Android 14; Mobile; rv:115.0) Gecko/115.0 Firefox/115.0is a shorter UA that has the same effect.
thx for the shorter version
However the
ytInitialDatain the mobile page is stringified JSON rather than actual JSON as in the desktop page, and this causes an additional API call (I suspect that no valid data is returned).
which API - to yt-server or internal json-thingy or where ? can you point me to the appropriate section of code - I may can help (I wrote some extractors for my servers - though they are on my private repo as they contain logins/secrets for my servers - so i wont share thoose of course) (I also modified some of your functions/lib files which I link in docker for specific calls - so I can dig around and try stuff :) )
Decoding the stringified JSON is no problem:
def _extract_yt_initial_variable(self, webpage, regex, video_id, name):
result = self._search_json(
regex, webpage, name, video_id, default={},
contains_pattern=r'(?:\{[\s\S]+}|(?P<_q>"|\')(?:(?!(?P=_q))[\s\S])+(?P=_q))',
end_pattern=r';\s*%s' % (self._YT_INITIAL_BOUNDARY_RE,),
transform_source=lambda s: self._parse_json(
s, video_id, transform_source=js_to_json, fatal=False) if s[:1] in '\'"' else s)
return result
I'll put up a PR once I've checked the playlist extraction.