yt-dlc icon indicating copy to clipboard operation
yt-dlc copied to clipboard

[Broken]Facebook private (friends only and private groups) error handling response is broken

Open someziggyman opened this issue 4 years ago • 10 comments

Checklist

  • [x] I'm reporting a broken site support
  • [x] I've verified that I'm running youtube-dlc version 2020.10.26
  • [x] I've checked that all provided URLs are alive and playable in a browser
  • [x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • [x] I've searched the bugtracker for similar issues including closed ones

Verbose log

./testdlc -v -F https://www.facebook.com/100002659934141/videos/3355692847862680/
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.facebook.com/100002659934141/videos/3355692847862680/']
[debug] Loading archive file None
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dlc version 2020.10.25
[debug] Python version 2.7.16 (CPython) - Darwin-19.6.0-x86_64-i386-64bit
[debug] exe versions: none
[debug] Proxy map: {}
[facebook] 3355692847862680: Downloading webpage
[facebook] 3355692847862680: Downloading webpage
[facebook] 3355692847862680: Downloading webpage
ERROR: Cannot parse data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dlc -U  to update. Be sure to call youtube-dlc with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "./testdlc/youtube_dlc/YoutubeDL.py", line 830, in extract_info
    ie_result = ie.extract(url)
  File "./testdlc/youtube_dlc/extractor/common.py", line 532, in extract
    ie_result = self._real_extract(url)
  File "./testdlc/youtube_dlc/extractor/facebook.py", line 484, in _real_extract
    video_id, fatal_if_no_video=True)
  File "./testdlc/youtube_dlc/extractor/facebook.py", line 380, in _extract_from_url
    raise ExtractorError('Cannot parse data')
ExtractorError: Cannot parse data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dlc -U  to update. Be sure to call youtube-dlc with the --verbose flag and include its complete output.

Description

Private videos used to work fine until lately. I assume the server response from FB changed or something. As a result, ytdlc does not how to handle it and ask for cookies or login, for example.

someziggyman avatar Oct 29 '20 09:10 someziggyman

does the video have a copy URL or similar ?? play the video and in the upper right hand corner click on the 3 dots ... see if there's a download URL / copy link that you can use to feed the video into youtube-dlc

october262 avatar Oct 29 '20 17:10 october262

does the video have a copy URL or similar ?? play the video and in the upper right hand corner click on the 3 dots ... see if there's a download URL / copy link that you can use to feed the video into youtube-dlc

You are right. If the video is owned by the user, you can download it via FB own UI now. But, what if it's a video shared in a private group? Not sure. Edited: got confirmation that private group videos do not have "download video" option available. Even if you are a member. Here's a test link, in case needed for private group video: https://www.facebook.com/1051184515/videos/10220691460290341/

ytdl used to work just fine with these vids and gave either "need to log in" or "cookies needed" type of error. Now it just crashes. Guess, it would be a good thing to bring back proper error handling for these cases.

someziggyman avatar Oct 29 '20 18:10 someziggyman

Hey @someziggyman and FYI @blackjack4494

I was able to resolve this issue by changing one regex. I was considering making a pull request but I don't really know how this change affects the rest of the extractor and other types of videos on FB.

Here's a test link, in case needed for private group video: https://www.facebook.com/1051184515/videos/10220691460290341/

Could you test if this change works for your case?

Find this section in extractor/facebook.py#L366 and replace the regex for the fb_dtsg param. Should also supply credentials or use a cookie header of course.

            # Video info not in first request, do a secondary request using
            # tahoe player specific URL
            tahoe_data = self._download_webpage(
                self._VIDEO_PAGE_TAHOE_TEMPLATE % video_id, video_id,
                data=urlencode_postdata({
                    '__a': 1,
                    '__pc': self._search_regex(
                        r'pkg_cohort["\']\s*:\s*["\'](.+?)["\']', webpage,
                        'pkg cohort', default='PHASED:DEFAULT'),
                    '__rev': self._search_regex(
                        r'client_revision["\']\s*:\s*(\d+),', webpage,
                        'client revision', default='3944515'),
                    'fb_dtsg': self._search_regex(
-                       r'"DTSGInitialData"\s*,\s*\[\]\s*,\s*{\s*"token"\s*:\s*"([^"]+)"',
+                       r'"MRequestConfig"\s*,\s*\[\]\s*,\s*{\s*"dtsg"\s*:\s*{\s*"token"\s*:\s*"([^"]+)"',
                        webpage, 'dtsg token', default=''),
                }),
                headers={
                    'Content-Type': 'application/x-www-form-urlencoded',
                })

ssaqua avatar Nov 08 '20 09:11 ssaqua

Hey @someziggyman and FYI @blackjack4494

I was able to resolve this issue by changing one regex. I was considering making a pull request but I don't really know how this change affects the rest of the extractor and other types of videos on FB.

Here's a test link, in case needed for private group video: https://www.facebook.com/1051184515/videos/10220691460290341/

Could you test if this change works for your case?

Find this section in extractor/facebook.py#L366 and replace the regex for the fb_dtsg param. Should also supply credentials or use a cookie header of course.

            # Video info not in first request, do a secondary request using
            # tahoe player specific URL
            tahoe_data = self._download_webpage(
                self._VIDEO_PAGE_TAHOE_TEMPLATE % video_id, video_id,
                data=urlencode_postdata({
                    '__a': 1,
                    '__pc': self._search_regex(
                        r'pkg_cohort["\']\s*:\s*["\'](.+?)["\']', webpage,
                        'pkg cohort', default='PHASED:DEFAULT'),
                    '__rev': self._search_regex(
                        r'client_revision["\']\s*:\s*(\d+),', webpage,
                        'client revision', default='3944515'),
                    'fb_dtsg': self._search_regex(
-                       r'"DTSGInitialData"\s*,\s*\[\]\s*,\s*{\s*"token"\s*:\s*"([^"]+)"',
+                       r'"MRequestConfig"\s*,\s*\[\]\s*,\s*{\s*"dtsg"\s*:\s*{\s*"token"\s*:\s*"([^"]+)"',
                        webpage, 'dtsg token', default=''),
                }),
                headers={
                    'Content-Type': 'application/x-www-form-urlencoded',
                })

Appreciate your contribution and help with this!

Indeed this fix works and does not seem to affect regular public videos like this: https://www.facebook.com/watch/?v=538723623491744 Tested several cases of these type.

However, to make this more usable and friendly, I assume some error handling is needed for this "private videos" case.. I mean, even with this fix working there's no way for the user to know he needs --cookies or credentials. Instead, he will get this log:

ERROR: Cannot parse data; please report this issue on https://github.com/blackjack4494/yt-dlc . Make sure you are using the latest version; type youtube-dlc -U to update. Be sure to call youtube-dlc with the --verbose flag and include its complete output. Traceback (most recent call last): File "./testdlc/youtube_dlc/YoutubeDL.py", line 830, in extract_info ie_result = ie.extract(url) File "./testdlc/youtube_dlc/extractor/common.py", line 532, in extract ie_result = self._real_extract(url) File "./testdlc/youtube_dlc/extractor/facebook.py", line 484, in _real_extract video_id, fatal_if_no_video=True) File "./testdlc/youtube_dlc/extractor/facebook.py", line 380, in _extract_from_url raise ExtractorError('Cannot parse data') ExtractorError: Cannot parse data; please report this issue on https://github.com/blackjack4494/yt-dlc . Make sure you are using the latest version; type youtube-dlc -U to update. Be sure to call youtube-dlc with the --verbose flag and include its complete output.

someziggyman avatar Nov 08 '20 10:11 someziggyman

I also tried to download a Facebook video and it didn't work 😕 But the fix proposed by @ssaqua works great, thanks so much for looking into it! ❤️

TPXP avatar Nov 09 '20 12:11 TPXP

Hi. @ssaqua. Thanks for providing this. Does this fix still work? I downloaded the master branch. edited the 1 line and then typed make like it says. When trying a download without a cookie header I also do not get the please log in error. Is there maybe a step I missed when trying to edit the file myself and doing the make or is the fix no longer working?

Edit: I am able to download when providing the username and password after the fix. I must be doing something wrong with the cookie i assume.

jonasvn75 avatar Nov 11 '20 20:11 jonasvn75

Edit: I am able to download when providing the username and password after the fix. I must be doing something wrong with the cookie i assume.

👍

Yeah this should still work as long as the request is properly authenticated. I use the --add-header 'Cookie: {copy-cookie-from-browser-request}' option instead of the --cookie FILE option.

ssaqua avatar Nov 12 '20 02:11 ssaqua

Missed this on my initial search, so I closed my issue #240 and referenced this one, it appears this issue with the regex is still present.

digital-pet avatar Nov 19 '20 00:11 digital-pet

The regex fix didnt work for me. Tried with Cookies File and Headers. None. Than tried this Workaround https://github.com/ytdl-org/youtube-dl/issues/27062#issuecomment-729442898 "youtube-dl --force-generic-extractor [link] and worked just fine."

rgime avatar Nov 20 '20 03:11 rgime

Than tried this Workaround ytdl-org/youtube-dl#27062 (comment) "youtube-dl --force-generic-extractor [link] and worked just fine."

@rgime on what endpoints does that work? What do your URLs look like? Is it something like that? https://www.facebook.com/<user_ID>/videos/<video_ID> Or like this? https://www.facebook.com/groups/<group_IP>/permalink/<post_ID>/

Is it really private? Does it ask for authentication at all? What version are you using?

vctls avatar Nov 20 '20 08:11 vctls