youtube-dl [TikTok] Support Sigi-type pages, etc

Please follow the guide below

Before submitting a pull request make sure you have:

[ ] Searched the bugtracker for similar pull requests
[x] Read adding new extractor tutorial
[x] Read youtube-dl coding conventions and adjusted the code to meet them
[x] Covered the code with tests (note that PRs without tests will be REJECTED)
[x] Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

[x] I am the original author of this code and I am willing to release it under Unlicense Except: this PR subsumes PR #30224 whose author also affirmed this.
[ ] I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

[x] Bug fix
[x] Improvement
[ ] New extractor
[ ] New feature

Description of your pull request and other information

TT switched (possibly partially) its framework from NextJS to Sigi, and the persisted state JSON sent in the page changed as a result. Instead of a <script> element with id __NEXT_DATA__, we get one with id sigi_persisted_state and JSON with a slightly different structure.

This PR deals with both types of page format, based on PR #30224 and this patch which gets more metadata.

Also, extraction could fail with a timeout (Error 60 in Windows, SSLError('The read operation timed out',) in Linux) or connection reset (Error 54 in Windows) due to some weird blocking by whatever fronts TikTok's pages (Akamai, apparenty). In order to download the page for parsing, some cookie has to be sent and a way to get it is to make a previous request to the site. The extractor fetched https://www.tiktok.com/ before doing anything else. In yt-dlp, the code fetches the webpage itself twice, commenting that you get 403 otherwise. This PR copies that tactic but instead of fetching the whole page (GET request) it just sends a HEAD request; if a page is actually returned, rather than an error with a Set-Cookie header, it doesn't actually have to be downloaded.

Probably resolves #28741 Resolves #30251 Resolves #30432 Resolves #30439 Resolves #30445 Resolves #30454 Resolves #30470.

Finally the non-working TikTokUserIE has been resurrected for accessing all the videos of a specific user.

Resolves #30174.

Jan 07 '22 13:01 dirkf

Patching hints, depending on your installation type (substitute PR number 30479 and file youtube_dl/extractor/tiktok.py as appropriate):

https://github.com/ytdl-org/youtube-dl/pull/30184#issuecomment-990859585
https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-965418428
https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-966349844
https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-972929975
https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-981108888.

Jan 07 '22 15:01 dirkf

Hi! After your patch has worked for several days, I am now encountering new problems (with the "vanilla" youtube-dl as well): #30538

Patrick

Jan 18 '22 19:01 hessijames79

when this merge?

May 02 '22 15:05 afterdelight

As observed in https://github.com/yt-dlp/yt-dlp/issues/3776#issuecomment-1155586954 the user pages are currently redirecting to a captcha more or less whatever we do wrt cookies and UAs.

In a browser with JS disabled and UA set to Mozilla/5.0 after clearing cookies for TT, a request to a user page gets the captcha page, and then reloading with the provided cookies opens the desired page. This doesn't happen with the extractor even with a delay between the two fetches.

Jun 14 '22 20:06 dirkf

Looks like every issue is about this, when will this get merged?

Dec 26 '22 00:12 bvoq

Do we think this will see the light of day? :D Was hoping to be able to use it for a little fun project!

Thanks

Aug 04 '23 09:08 OwenMelbz

I think this is also outdated now. There is no sigi_persisted_state in the returned HTML.

Aug 04 '23 19:08 kashif-umair

youtube-dl youtube-dl copied to clipboard

[TikTok] Support Sigi-type pages, etc

Please follow the guide below

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

youtube-dl
youtube-dl copied to clipboard