youtube-dl icon indicating copy to clipboard operation
youtube-dl copied to clipboard

[TikTok] Support Sigi-type pages, etc

Open dirkf opened this issue 3 years ago • 5 comments

Please follow the guide below


Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • [x] I am the original author of this code and I am willing to release it under Unlicense Except: this PR subsumes PR #30224 whose author also affirmed this.
  • [ ] I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • [x] Bug fix
  • [x] Improvement
  • [ ] New extractor
  • [ ] New feature

Description of your pull request and other information

TT switched (possibly partially) its framework from NextJS to Sigi, and the persisted state JSON sent in the page changed as a result. Instead of a <script> element with id __NEXT_DATA__, we get one with id sigi_persisted_state and JSON with a slightly different structure.

This PR deals with both types of page format, based on PR #30224 and this patch which gets more metadata.

Also, extraction could fail with a timeout (Error 60 in Windows, SSLError('The read operation timed out',) in Linux) or connection reset (Error 54 in Windows) due to some weird blocking by whatever fronts TikTok's pages (Akamai, apparenty). In order to download the page for parsing, some cookie has to be sent and a way to get it is to make a previous request to the site. The extractor fetched https://www.tiktok.com/ before doing anything else. In yt-dlp, the code fetches the webpage itself twice, commenting that you get 403 otherwise. This PR copies that tactic but instead of fetching the whole page (GET request) it just sends a HEAD request; if a page is actually returned, rather than an error with a Set-Cookie header, it doesn't actually have to be downloaded.

Probably resolves #28741 Resolves #30251 Resolves #30432 Resolves #30439 Resolves #30445 Resolves #30454 Resolves #30470.

Finally the non-working TikTokUserIE has been resurrected for accessing all the videos of a specific user.

Resolves #30174.

dirkf avatar Jan 07 '22 13:01 dirkf

Patching hints, depending on your installation type (substitute PR number 30479 and file youtube_dl/extractor/tiktok.py as appropriate):

  • https://github.com/ytdl-org/youtube-dl/pull/30184#issuecomment-990859585
  • https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-965418428
  • https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-966349844
  • https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-972929975
  • https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-981108888.

dirkf avatar Jan 07 '22 15:01 dirkf

Hi! After your patch has worked for several days, I am now encountering new problems (with the "vanilla" youtube-dl as well): #30538

Patrick

hessijames79 avatar Jan 18 '22 19:01 hessijames79

when this merge?

afterdelight avatar May 02 '22 15:05 afterdelight

As observed in https://github.com/yt-dlp/yt-dlp/issues/3776#issuecomment-1155586954 the user pages are currently redirecting to a captcha more or less whatever we do wrt cookies and UAs.

In a browser with JS disabled and UA set to Mozilla/5.0 after clearing cookies for TT, a request to a user page gets the captcha page, and then reloading with the provided cookies opens the desired page. This doesn't happen with the extractor even with a delay between the two fetches.

dirkf avatar Jun 14 '22 20:06 dirkf

Looks like every issue is about this, when will this get merged?

bvoq avatar Dec 26 '22 00:12 bvoq

Do we think this will see the light of day? :D Was hoping to be able to use it for a little fun project!

Thanks

OwenMelbz avatar Aug 04 '23 09:08 OwenMelbz

I think this is also outdated now. There is no sigi_persisted_state in the returned HTML.

kashif-umair avatar Aug 04 '23 19:08 kashif-umair