facebook-scraper
facebook-scraper copied to clipboard
understanding get_posts() parameters pages and post_per_page?
I don't know if I'm misunderstanding facebook-scraper, my own code, or I'm seeing a bug. When I ran the code fragment:
for username in usernames:
total_posts = 12
if username == frequent_poster:
total_posts = 24
posts_per_page = 4
pages = (total_posts + posts_per_page - 1) // posts_per_page
print('')
print('total_posts = %d, posts_per_page = %d, pages = %d' % (total_posts, posts_per_page, pages))
posts = get_posts(username, cookies=cookie_file, pages=pages, extra_info=True,
options={'posts_per_page': posts_per_page, 'allow_extra_requests': False, 'HQ_images': False})
print('Actual number of posts = %d' % (sum(1 for i in posts)))
I got the output:
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 9
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 9
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 30
...
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 30
total_posts = 24, posts_per_page = 4, pages = 6
Actual number of posts = 63
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 30
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 9
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 30
...
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 30
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 9
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 30
...
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 30
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 6
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 31
total_posts = 12, posts_per_page = 4, pages = 3
Actual number of posts = 30
I'm trying to understand how post extraction and its parameters work, and I expected the actual number of posts to be less than or equal to total_posts
in each case. My goal is to make as few requests as possible.
What am I misunderstanding?
As always, I appreciate your help.
For Pages, the first page gives 2 posts, and subsequent ones give 4 (or whatever you've set posts_per_page to). For groups, posts_per_page has no effect, and you get ~20-30 posts per page
Okay. All of the output I posted is from Pages.
How does
posts = get_posts(username, cookies=cookie_file, pages=3, extra_info=True,
options={'posts_per_page': 4, 'allow_extra_requests': False, 'HQ_images': False})
print('Actual number of posts = %d' % (sum(1 for i in posts)))
i.e.,
pages = 3; posts_per_page = 4
result in the output
Actual number of posts = 30
?
Shouldn't the actual number of posts be less than or equal to 2 + (3 - 1) x 4 = 10?
(Or does the expression sum(1 for i in posts)
not result in the actual number of posts?)
What pages are you having this problem with?
Apparently all user home pages. (Is that the correct Facebook lingo?) The output I posted is from the program snippet I used to try to figure out what was actually happening so I could understand pages
and posts_per_page
and optimize my program. The output lines hidden by the ellipses are the same as the preceding and following lines.
What profiles are you having this problem with?
Ok, I just tested with "zuck", and got the same behavior as you - posts_per_page
is ignored, and you get 10 posts per page. posts_per_page
only works for Pages.
Oh, I see. Thanks, I didn't know that a Facebook Page wasn't a user's page. (Sorry.) Do I understand correctly then that for users' pages (e.g., "zuck") that the parameter pages
works and that the number of posts returned is 10 * pages
? And is the default for pages
still 10 in this case?
Just to make sure, is it true that there is 1 request for each page
asked for in the get_posts()
call, and also 1 request for each post that's actually consumed from the generator returned by get_posts()
?
the parameter pages works and that the number of posts returned is 10 * pages?
Yes
And is the default for pages still 10 in this case?
Yes
is it true that there is 1 request for each page asked for in the get_posts() call
Yes
also 1 request for each post that's actually consumed from the generator returned by get_posts()
Only when allow_extra_requests
is True
, and only when necessary (such as to fetch full post text or HQ images)
Thanks, neon-ninja. This helps a lot - I think I understand now.
also 1 request for each post that's actually consumed from the generator returned by get_posts()
Only when
allow_extra_requests
isTrue
, and only when necessary (such as to fetch full post text or HQ images)
What happens when reactors
is True
? Is there also 1 extra request per post actually consumed from the generator returned by get_posts()
? (And possibly more than 1 if there are a lot of reactors?)
What happens when reactors is True? Is there also 1 extra request per post actually consumed from the generator returned by get_posts()? (And possibly more than 1 if there are a lot of reactors?)
Yes. As you consume the generator, the requests are made. Reactors only come in pages of 50, so if there's 500 reactors, that's an extra 10 requests. For large reactor extraction tasks, you can set "reactors": "generator"
and consume them at your desired rate.
Thanks, I think I understand how that works now.
I just did a couple experiments using start_url
and request_url_callback
as discussed in other Issue threads, and I believe I'm seeing that these two parameters don't work for user profile pages like "zuck". Is my understanding correct here?
Working fine for me, this test code:
start="https://m.facebook.com/profile/timeline/stream/?cursor=AQHRBW_YjLW01jztkzIND8c0CFXEoeYGEyAcNFxN5yd_oh_KNi5zvniJFaJEiCWKLW3gxhvfKI1WCV7k_F4ay8pwr3ZjM7iM8KTzZmy8KqVFgHXLWuWKiNFc0h6ftXZrq4oq&start_time=-9223372036854775808&profile_id=4&replace_id=u_0_0_pK\\"
post = next(get_posts("zuck", start_url=start, options={"allow_extra_requests": False}))
print(post["time"])
outputs:
2021-06-05 21:39:00
Can you please post your code?
Hmm. I was using:
def handle_pagination_url(url):
global start_url
start_url = url
print('handle_pagination_url(): start_url =', start_url)
pages = 1
start_url = None
posts = get_posts(username, cookies=cookie_file, pages=pages, extra_info=True,
options={'allow_extra_requests': False, 'HQ_images': False},
page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url)
which gave the output
handle_pagination_url(): start_url = https://m.facebook.com/someusername/
which doesn't contain any pagination information. What am I doing wrong? (How does one bootstrap the pagination process?)
P.S. Ten posts were returned - it's that the start_url
doesn't seem useful.
Pagination URLs are only useful past the first page. Try increase pages
from 1.
When I run
def handle_pagination_url(url):
global start_url
start_url = url
print('handle_pagination_url(): start_url =', start_url)
pages = 2
start_url = None
posts = get_posts(username, cookies=cookie_file, pages=pages, extra_info=True,
options={'allow_extra_requests': False, 'HQ_images': False},
page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url)
time.sleep(10)
posts = get_posts(username, cookies=cookie_file, pages=pages, extra_info=True,
options={'allow_extra_requests': False, 'HQ_images': False},
page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url)
I get
handle_pagination_url(): start_url = https://m.facebook.com/username/
handle_pagination_url(): start_url = https://m.facebook.com/profile/timeline/stream/?cursor=AQHRtb9DestJgAow_WBgNL6qHEOjiM_KORFEoKyfMYvYCj_xxb0iOPcFIOOS-HpzgwwsmClbQgUj_lpYzD7PiT_dRWtC7nlTLES3r1RqMIeIfEs7SJvsOVkbTaKG34YJ0uO8&start_time=-9223372036854775808&profile_id=1299182548&replace_id=u_0_21_SO
So when start_url = None
, the pagination URL returned is trivial, but when
start_url = https://m.facebook.com/username/
the pagination URL returned contains pagination information. I was expecting start_url = None
and start_url = https://m.facebook.com/username/
to be equivalent since I'm also passing a username
to get_posts()
and therefore the posts
returned should be the same. (That's correct, isn't it?)
I was expecting the returned pagination URL to point to the next page of posts. Is that information available? I'm not trying to recover from error but only to minimize requests, so I'd like to go page by page so I can stop at the right place without having pages
be set unnecessarily high. Does this make sense?
posts = get_posts(username, cookies=cookie_file, pages=pages, extra_info=True,
options={'allow_extra_requests': False, 'HQ_images': False},
page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url)
Doesn't do anything on it's own, unless you consume this generator. Such as by wrapping it in list()
. After modifying your code to do this, I get the following output:
handle_pagination_url(): start_url = https://m.facebook.com/zuck/
handle_pagination_url(): start_url = https://m.facebook.com/profile/timeline/stream/?cursor=AQHRE8LFXmjhdgvpfHcn0UZi7DFQbIbruUnnOTqO-1vmf0706Qjvk_bP6jlpntJBpMBsfiVaBLCo8zFCbOi7tqmJl2Xgp7HpTov4RfFmtVrs14nfMg-Fe0hOUwFPsjLMRHRp&start_time=-9223372036854775808&profile_id=4&replace_id=u_0_1j_%2Fc
handle_pagination_url(): start_url = https://m.facebook.com/profile/timeline/stream/?cursor=AQHRE8LFXmjhdgvpfHcn0UZi7DFQbIbruUnnOTqO-1vmf0706Qjvk_bP6jlpntJBpMBsfiVaBLCo8zFCbOi7tqmJl2Xgp7HpTov4RfFmtVrs14nfMg-Fe0hOUwFPsjLMRHRp&start_time=-9223372036854775808&profile_id=4&replace_id=u_0_1j_%2Fc
handle_pagination_url(): start_url = https://m.facebook.com/profile/timeline/stream/?cursor=AQHRd-UdyYiKkSK0dEgqr5h9_YFBWX3gLNUguL6AjFDdn2t6bbd0AEMd-JRrX7vG-N38l7YvsUeatX3JdHJNsIK-BloFke5GbuBVl1ZBr-ca8apW16w04WWwylk6Nmisre2b&start_time=-9223372036854775808&profile_id=4&replace_id=u_0_0_vS\
I was expecting start_url = None and start_url = https://m.facebook.com/username/ to be equivalent since I'm also passing a username to get_posts() and therefore the posts returned should be the same. (That's correct, isn't it?)
Yes, that's correct.
I was expecting the returned pagination URL to point to the next page of posts. Is that information available?
Yes - you can see the returned pagination URL in both your output and mine - it's the URLs like "https://m.facebook.com/profile/timeline/stream/?cursor=" etc
I'm not trying to recover from error but only to minimize requests, so I'd like to go page by page so I can stop at the right place without having pages be set unnecessarily high
Then I think you're needlessly complicating things. When you've made all the requests you need, you can just stop iterating through the get_posts
generator, either with break
or return
or even sys.exit()
@neon-ninja Does this still work for you? It's not working for me anymore.
Yes, it does. What error do you get?
I'm not getting any error that I can tell. Using
def handle_pagination_url(url):
global start_url
start_url = url
print('handle_pagination_url(): start_url =', start_url)
username = 'zuck'
pages = 4
start_url = None
posts = get_posts(username, cookies=cookie_file, pages=pages, extra_info=True,
options={'allow_extra_requests': False, 'HQ_images': False},
page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url)
posts_list = list(posts)
time.sleep(10)
posts = get_posts(username, cookies=cookie_file, pages=pages, extra_info=True,
options={'allow_extra_requests': False, 'HQ_images': False},
page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url)
posts_list = list(posts)
time.sleep(10)
posts = get_posts(username, cookies=cookie_file, pages=pages, extra_info=True,
options={'allow_extra_requests': False, 'HQ_images': False},
page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url)
posts_list = list(posts)
I'm getting the output
handle_pagination_url(): start_url = https://m.facebook.com/zuck/
handle_pagination_url(): start_url = https://m.facebook.com/zuck/
handle_pagination_url(): start_url = https://m.facebook.com/zuck/
(I feel very confused.)
So you're unable to paginate on zuck's page? Are you sure your cookies are working ok? What does
print(len(list(get_posts("zuck", cookies=cookie_file, options={'allow_extra_requests': False}))))
output for you? I get 100
. Can you try enable_logging()
and post the debug logs?
I get 10.
Does that mean my cookies aren't working?
Starting to iterate pages
Exception while requesting URL: https://m.facebook.com/zuck/posts/
Exception: HTTPError('404 Client Error: Not Found for url: https://m.facebook.com/zuck/posts/?locale=en_US')
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.7/site-packages/facebook_scraper/facebook_scraper.py", line 809, in get
response.raise_for_status()
File "/usr/local/anaconda3/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://m.facebook.com/zuck/posts/?locale=en_US
404 Client Error: Not Found for url: https://m.facebook.com/zuck/posts/?locale=en_US
Requesting page from: https://m.facebook.com/zuck/
Parsing page response
Got 10 raw posts from page
Extracting posts from page 0
[10114405231577241] Extract method extract_photo_link didn't return anything
[10114405231577241] Extract method extract_video didn't return anything
[10114405231577241] Extract method extract_video_thumbnail didn't return anything
[10114405231577241] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114405231577241] Extract method extract_factcheck didn't return anything
[10114405231577241] Extract method extract_share_information didn't return anything
[10114405231577241] Extract method extract_listing didn't return anything
[10114405231577241] Extract method extract_with didn't return anything
[10114394031816651] Extract method extract_photo_link didn't return anything
[10114394031816651] Extract method extract_video didn't return anything
[10114394031816651] Extract method extract_video_thumbnail didn't return anything
[10114394031816651] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114394031816651] Extract method extract_factcheck didn't return anything
[10114394031816651] Extract method extract_share_information didn't return anything
[10114394031816651] Extract method extract_listing didn't return anything
[10114394031816651] Extract method extract_with didn't return anything
[10114380064791681] Extract method extract_photo_link didn't return anything
[10114380064791681] Extract method extract_video didn't return anything
[10114380064791681] Extract method extract_video_thumbnail didn't return anything
[10114380064791681] Extract method extract_video_id didn't return anything
[10114380064791681] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114380064791681] Extract method extract_factcheck didn't return anything
[10114380064791681] Extract method extract_share_information didn't return anything
[10114380064791681] Extract method extract_listing didn't return anything
[10114380064791681] Extract method extract_with didn't return anything
[10114366512031521] Extract method extract_photo_link didn't return anything
[10114366512031521] Extract method extract_video didn't return anything
[10114366512031521] Extract method extract_video_thumbnail didn't return anything
[10114366512031521] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114366512031521] Extract method extract_factcheck didn't return anything
[10114366512031521] Extract method extract_share_information didn't return anything
[10114366512031521] Extract method extract_listing didn't return anything
[10114366512031521] Extract method extract_with didn't return anything
[10114340420204751] Extract method extract_photo_link didn't return anything
[10114340420204751] Extract method extract_video didn't return anything
[10114340420204751] Extract method extract_video_thumbnail didn't return anything
[10114340420204751] Extract method extract_video_id didn't return anything
[10114340420204751] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114340420204751] Extract method extract_factcheck didn't return anything
[10114340420204751] Extract method extract_share_information didn't return anything
[10114340420204751] Extract method extract_listing didn't return anything
[677947796579229] Extract method extract_photo_link didn't return anything
[677947796579229] Extract method extract_video didn't return anything
[677947796579229] Extract method extract_video_thumbnail didn't return anything
[677947796579229] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[677947796579229] Extract method extract_factcheck didn't return anything
[677947796579229] Extract method extract_share_information didn't return anything
[677947796579229] Extract method extract_listing didn't return anything
[677947796579229] Extract method extract_with didn't return anything
[10114335992213481] Extract method extract_photo_link didn't return anything
[10114335992213481] Extract method extract_video didn't return anything
[10114335992213481] Extract method extract_video_thumbnail didn't return anything
[10114335992213481] Extract method extract_video_id didn't return anything
[10114335992213481] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335992213481] Extract method extract_factcheck didn't return anything
[10114335992213481] Extract method extract_share_information didn't return anything
[10114335992213481] Extract method extract_listing didn't return anything
[10114335992213481] Extract method extract_with didn't return anything
[10114335272495801] Extract method extract_photo_link didn't return anything
[10114335272495801] Extract method extract_video didn't return anything
[10114335272495801] Extract method extract_video_thumbnail didn't return anything
[10114335272495801] Extract method extract_video_id didn't return anything
[10114335272495801] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335272495801] Extract method extract_factcheck didn't return anything
[10114335272495801] Extract method extract_share_information didn't return anything
[10114335272495801] Extract method extract_listing didn't return anything
[10114335272495801] Extract method extract_with didn't return anything
[10114319077261181] Extract method extract_photo_link didn't return anything
[10114319077261181] Extract method extract_video didn't return anything
[10114319077261181] Extract method extract_video_thumbnail didn't return anything
[10114319077261181] Extract method extract_video_id didn't return anything
[10114319077261181] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114319077261181] Extract method extract_factcheck didn't return anything
[10114319077261181] Extract method extract_share_information didn't return anything
[10114319077261181] Extract method extract_listing didn't return anything
[10114319077261181] Extract method extract_with didn't return anything
[10114316913387601] Extract method extract_photo_link didn't return anything
[10114316913387601] Extract method extract_video didn't return anything
[10114316913387601] Extract method extract_video_thumbnail didn't return anything
[10114316913387601] Extract method extract_video_id didn't return anything
[10114316913387601] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114316913387601] Extract method extract_factcheck didn't return anything
[10114316913387601] Extract method extract_share_information didn't return anything
[10114316913387601] Extract method extract_listing didn't return anything
[10114316913387601] Extract method extract_with didn't return anything
Looking for next page URL
Page parser did not find next page URL
10
Starting to iterate pages
Exception while requesting URL: https://m.facebook.com/zuck/posts/
Exception: HTTPError('404 Client Error: Not Found for url: https://m.facebook.com/zuck/posts/?locale=en_US')
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.7/site-packages/facebook_scraper/facebook_scraper.py", line 809, in get
response.raise_for_status()
File "/usr/local/anaconda3/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://m.facebook.com/zuck/posts/?locale=en_US
404 Client Error: Not Found for url: https://m.facebook.com/zuck/posts/?locale=en_US
handle_pagination_url(): start_url = https://m.facebook.com/zuck/
Requesting page from: https://m.facebook.com/zuck/
Parsing page response
Got 10 raw posts from page
Extracting posts from page 0
[10114405231577241] Extract method extract_photo_link didn't return anything
[10114405231577241] Extract method extract_video didn't return anything
[10114405231577241] Extract method extract_video_thumbnail didn't return anything
[10114405231577241] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114405231577241] Extract method extract_factcheck didn't return anything
[10114405231577241] Extract method extract_share_information didn't return anything
[10114405231577241] Extract method extract_listing didn't return anything
[10114405231577241] Extract method extract_with didn't return anything
[10114405231577241] Exception while extracting reactions: KeyError('reactors')
[10114394031816651] Extract method extract_photo_link didn't return anything
[10114394031816651] Extract method extract_video didn't return anything
[10114394031816651] Extract method extract_video_thumbnail didn't return anything
[10114394031816651] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114394031816651] Extract method extract_factcheck didn't return anything
[10114394031816651] Extract method extract_share_information didn't return anything
[10114394031816651] Extract method extract_listing didn't return anything
[10114394031816651] Extract method extract_with didn't return anything
[10114394031816651] Exception while extracting reactions: KeyError('reactors')
[10114380064791681] Extract method extract_photo_link didn't return anything
[10114380064791681] Extract method extract_video didn't return anything
[10114380064791681] Extract method extract_video_thumbnail didn't return anything
[10114380064791681] Extract method extract_video_id didn't return anything
[10114380064791681] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114380064791681] Extract method extract_factcheck didn't return anything
[10114380064791681] Extract method extract_share_information didn't return anything
[10114380064791681] Extract method extract_listing didn't return anything
[10114380064791681] Extract method extract_with didn't return anything
[10114380064791681] Exception while extracting reactions: KeyError('reactors')
[10114366512031521] Extract method extract_photo_link didn't return anything
[10114366512031521] Extract method extract_video didn't return anything
[10114366512031521] Extract method extract_video_thumbnail didn't return anything
[10114366512031521] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114366512031521] Extract method extract_factcheck didn't return anything
[10114366512031521] Extract method extract_share_information didn't return anything
[10114366512031521] Extract method extract_listing didn't return anything
[10114366512031521] Extract method extract_with didn't return anything
[10114366512031521] Exception while extracting reactions: KeyError('reactors')
[10114340420204751] Extract method extract_photo_link didn't return anything
[10114340420204751] Extract method extract_video didn't return anything
[10114340420204751] Extract method extract_video_thumbnail didn't return anything
[10114340420204751] Extract method extract_video_id didn't return anything
[10114340420204751] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114340420204751] Extract method extract_factcheck didn't return anything
[10114340420204751] Extract method extract_share_information didn't return anything
[10114340420204751] Extract method extract_listing didn't return anything
[10114340420204751] Exception while extracting reactions: KeyError('reactors')
[677947796579229] Extract method extract_photo_link didn't return anything
[677947796579229] Extract method extract_video didn't return anything
[677947796579229] Extract method extract_video_thumbnail didn't return anything
[677947796579229] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[677947796579229] Extract method extract_factcheck didn't return anything
[677947796579229] Extract method extract_share_information didn't return anything
[677947796579229] Extract method extract_listing didn't return anything
[677947796579229] Extract method extract_with didn't return anything
[677947796579229] Exception while extracting reactions: KeyError('reactors')
[10114335992213481] Extract method extract_photo_link didn't return anything
[10114335992213481] Extract method extract_video didn't return anything
[10114335992213481] Extract method extract_video_thumbnail didn't return anything
[10114335992213481] Extract method extract_video_id didn't return anything
[10114335992213481] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335992213481] Extract method extract_factcheck didn't return anything
[10114335992213481] Extract method extract_share_information didn't return anything
[10114335992213481] Extract method extract_listing didn't return anything
[10114335992213481] Extract method extract_with didn't return anything
[10114335992213481] Exception while extracting reactions: KeyError('reactors')
[10114335272495801] Extract method extract_photo_link didn't return anything
[10114335272495801] Extract method extract_video didn't return anything
[10114335272495801] Extract method extract_video_thumbnail didn't return anything
[10114335272495801] Extract method extract_video_id didn't return anything
[10114335272495801] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335272495801] Extract method extract_factcheck didn't return anything
[10114335272495801] Extract method extract_share_information didn't return anything
[10114335272495801] Extract method extract_listing didn't return anything
[10114335272495801] Extract method extract_with didn't return anything
[10114335272495801] Exception while extracting reactions: KeyError('reactors')
[10114319077261181] Extract method extract_photo_link didn't return anything
[10114319077261181] Extract method extract_video didn't return anything
[10114319077261181] Extract method extract_video_thumbnail didn't return anything
[10114319077261181] Extract method extract_video_id didn't return anything
[10114319077261181] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114319077261181] Extract method extract_factcheck didn't return anything
[10114319077261181] Extract method extract_share_information didn't return anything
[10114319077261181] Extract method extract_listing didn't return anything
[10114319077261181] Extract method extract_with didn't return anything
[10114319077261181] Exception while extracting reactions: KeyError('reactors')
[10114316913387601] Extract method extract_photo_link didn't return anything
[10114316913387601] Extract method extract_video didn't return anything
[10114316913387601] Extract method extract_video_thumbnail didn't return anything
[10114316913387601] Extract method extract_video_id didn't return anything
[10114316913387601] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114316913387601] Extract method extract_factcheck didn't return anything
[10114316913387601] Extract method extract_share_information didn't return anything
[10114316913387601] Extract method extract_listing didn't return anything
[10114316913387601] Extract method extract_with didn't return anything
[10114316913387601] Exception while extracting reactions: KeyError('reactors')
Looking for next page URL
Page parser did not find next page URL
Starting to iterate pages
handle_pagination_url(): start_url = https://m.facebook.com/zuck/
Requesting page from: https://m.facebook.com/zuck/
Parsing page response
Got 10 raw posts from page
Extracting posts from page 0
[10114405231577241] Extract method extract_photo_link didn't return anything
[10114405231577241] Extract method extract_video didn't return anything
[10114405231577241] Extract method extract_video_thumbnail didn't return anything
[10114405231577241] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114405231577241] Extract method extract_factcheck didn't return anything
[10114405231577241] Extract method extract_share_information didn't return anything
[10114405231577241] Extract method extract_listing didn't return anything
[10114405231577241] Extract method extract_with didn't return anything
[10114405231577241] Exception while extracting reactions: KeyError('reactors')
[10114394031816651] Extract method extract_photo_link didn't return anything
[10114394031816651] Extract method extract_video didn't return anything
[10114394031816651] Extract method extract_video_thumbnail didn't return anything
[10114394031816651] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114394031816651] Extract method extract_factcheck didn't return anything
[10114394031816651] Extract method extract_share_information didn't return anything
[10114394031816651] Extract method extract_listing didn't return anything
[10114394031816651] Extract method extract_with didn't return anything
[10114394031816651] Exception while extracting reactions: KeyError('reactors')
[10114380064791681] Extract method extract_photo_link didn't return anything
[10114380064791681] Extract method extract_video didn't return anything
[10114380064791681] Extract method extract_video_thumbnail didn't return anything
[10114380064791681] Extract method extract_video_id didn't return anything
[10114380064791681] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114380064791681] Extract method extract_factcheck didn't return anything
[10114380064791681] Extract method extract_share_information didn't return anything
[10114380064791681] Extract method extract_listing didn't return anything
[10114380064791681] Extract method extract_with didn't return anything
[10114380064791681] Exception while extracting reactions: KeyError('reactors')
[10114366512031521] Extract method extract_photo_link didn't return anything
[10114366512031521] Extract method extract_video didn't return anything
[10114366512031521] Extract method extract_video_thumbnail didn't return anything
[10114366512031521] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114366512031521] Extract method extract_factcheck didn't return anything
[10114366512031521] Extract method extract_share_information didn't return anything
[10114366512031521] Extract method extract_listing didn't return anything
[10114366512031521] Extract method extract_with didn't return anything
[10114366512031521] Exception while extracting reactions: KeyError('reactors')
[10114340420204751] Extract method extract_photo_link didn't return anything
[10114340420204751] Extract method extract_video didn't return anything
[10114340420204751] Extract method extract_video_thumbnail didn't return anything
[10114340420204751] Extract method extract_video_id didn't return anything
[10114340420204751] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114340420204751] Extract method extract_factcheck didn't return anything
[10114340420204751] Extract method extract_share_information didn't return anything
[10114340420204751] Extract method extract_listing didn't return anything
[10114340420204751] Exception while extracting reactions: KeyError('reactors')
[677947796579229] Extract method extract_photo_link didn't return anything
[677947796579229] Extract method extract_video didn't return anything
[677947796579229] Extract method extract_video_thumbnail didn't return anything
[677947796579229] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[677947796579229] Extract method extract_factcheck didn't return anything
[677947796579229] Extract method extract_share_information didn't return anything
[677947796579229] Extract method extract_listing didn't return anything
[677947796579229] Extract method extract_with didn't return anything
[677947796579229] Exception while extracting reactions: KeyError('reactors')
[10114335992213481] Extract method extract_photo_link didn't return anything
[10114335992213481] Extract method extract_video didn't return anything
[10114335992213481] Extract method extract_video_thumbnail didn't return anything
[10114335992213481] Extract method extract_video_id didn't return anything
[10114335992213481] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335992213481] Extract method extract_factcheck didn't return anything
[10114335992213481] Extract method extract_share_information didn't return anything
[10114335992213481] Extract method extract_listing didn't return anything
[10114335992213481] Extract method extract_with didn't return anything
[10114335992213481] Exception while extracting reactions: KeyError('reactors')
[10114335272495801] Extract method extract_photo_link didn't return anything
[10114335272495801] Extract method extract_video didn't return anything
[10114335272495801] Extract method extract_video_thumbnail didn't return anything
[10114335272495801] Extract method extract_video_id didn't return anything
[10114335272495801] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335272495801] Extract method extract_factcheck didn't return anything
[10114335272495801] Extract method extract_share_information didn't return anything
[10114335272495801] Extract method extract_listing didn't return anything
[10114335272495801] Extract method extract_with didn't return anything
[10114335272495801] Exception while extracting reactions: KeyError('reactors')
[10114319077261181] Extract method extract_photo_link didn't return anything
[10114319077261181] Extract method extract_video didn't return anything
[10114319077261181] Extract method extract_video_thumbnail didn't return anything
[10114319077261181] Extract method extract_video_id didn't return anything
[10114319077261181] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114319077261181] Extract method extract_factcheck didn't return anything
[10114319077261181] Extract method extract_share_information didn't return anything
[10114319077261181] Extract method extract_listing didn't return anything
[10114319077261181] Extract method extract_with didn't return anything
[10114319077261181] Exception while extracting reactions: KeyError('reactors')
[10114316913387601] Extract method extract_photo_link didn't return anything
[10114316913387601] Extract method extract_video didn't return anything
[10114316913387601] Extract method extract_video_thumbnail didn't return anything
[10114316913387601] Extract method extract_video_id didn't return anything
[10114316913387601] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114316913387601] Extract method extract_factcheck didn't return anything
[10114316913387601] Extract method extract_share_information didn't return anything
[10114316913387601] Extract method extract_listing didn't return anything
[10114316913387601] Extract method extract_with didn't return anything
[10114316913387601] Exception while extracting reactions: KeyError('reactors')
Looking for next page URL
Page parser did not find next page URL
Starting to iterate pages
handle_pagination_url(): start_url = https://m.facebook.com/zuck/
Requesting page from: https://m.facebook.com/zuck/
Parsing page response
Got 10 raw posts from page
Extracting posts from page 0
[10114405231577241] Extract method extract_photo_link didn't return anything
[10114405231577241] Extract method extract_video didn't return anything
[10114405231577241] Extract method extract_video_thumbnail didn't return anything
[10114405231577241] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114405231577241] Extract method extract_factcheck didn't return anything
[10114405231577241] Extract method extract_share_information didn't return anything
[10114405231577241] Extract method extract_listing didn't return anything
[10114405231577241] Extract method extract_with didn't return anything
[10114405231577241] Exception while extracting reactions: KeyError('reactors')
[10114394031816651] Extract method extract_photo_link didn't return anything
[10114394031816651] Extract method extract_video didn't return anything
[10114394031816651] Extract method extract_video_thumbnail didn't return anything
[10114394031816651] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114394031816651] Extract method extract_factcheck didn't return anything
[10114394031816651] Extract method extract_share_information didn't return anything
[10114394031816651] Extract method extract_listing didn't return anything
[10114394031816651] Extract method extract_with didn't return anything
[10114394031816651] Exception while extracting reactions: KeyError('reactors')
[10114380064791681] Extract method extract_photo_link didn't return anything
[10114380064791681] Extract method extract_video didn't return anything
[10114380064791681] Extract method extract_video_thumbnail didn't return anything
[10114380064791681] Extract method extract_video_id didn't return anything
[10114380064791681] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114380064791681] Extract method extract_factcheck didn't return anything
[10114380064791681] Extract method extract_share_information didn't return anything
[10114380064791681] Extract method extract_listing didn't return anything
[10114380064791681] Extract method extract_with didn't return anything
[10114380064791681] Exception while extracting reactions: KeyError('reactors')
[10114366512031521] Extract method extract_photo_link didn't return anything
[10114366512031521] Extract method extract_video didn't return anything
[10114366512031521] Extract method extract_video_thumbnail didn't return anything
[10114366512031521] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114366512031521] Extract method extract_factcheck didn't return anything
[10114366512031521] Extract method extract_share_information didn't return anything
[10114366512031521] Extract method extract_listing didn't return anything
[10114366512031521] Extract method extract_with didn't return anything
[10114366512031521] Exception while extracting reactions: KeyError('reactors')
[10114340420204751] Extract method extract_photo_link didn't return anything
[10114340420204751] Extract method extract_video didn't return anything
[10114340420204751] Extract method extract_video_thumbnail didn't return anything
[10114340420204751] Extract method extract_video_id didn't return anything
[10114340420204751] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114340420204751] Extract method extract_factcheck didn't return anything
[10114340420204751] Extract method extract_share_information didn't return anything
[10114340420204751] Extract method extract_listing didn't return anything
[10114340420204751] Exception while extracting reactions: KeyError('reactors')
[677947796579229] Extract method extract_photo_link didn't return anything
[677947796579229] Extract method extract_video didn't return anything
[677947796579229] Extract method extract_video_thumbnail didn't return anything
[677947796579229] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[677947796579229] Extract method extract_factcheck didn't return anything
[677947796579229] Extract method extract_share_information didn't return anything
[677947796579229] Extract method extract_listing didn't return anything
[677947796579229] Extract method extract_with didn't return anything
[677947796579229] Exception while extracting reactions: KeyError('reactors')
[10114335992213481] Extract method extract_photo_link didn't return anything
[10114335992213481] Extract method extract_video didn't return anything
[10114335992213481] Extract method extract_video_thumbnail didn't return anything
[10114335992213481] Extract method extract_video_id didn't return anything
[10114335992213481] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335992213481] Extract method extract_factcheck didn't return anything
[10114335992213481] Extract method extract_share_information didn't return anything
[10114335992213481] Extract method extract_listing didn't return anything
[10114335992213481] Extract method extract_with didn't return anything
[10114335992213481] Exception while extracting reactions: KeyError('reactors')
[10114335272495801] Extract method extract_photo_link didn't return anything
[10114335272495801] Extract method extract_video didn't return anything
[10114335272495801] Extract method extract_video_thumbnail didn't return anything
[10114335272495801] Extract method extract_video_id didn't return anything
[10114335272495801] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335272495801] Extract method extract_factcheck didn't return anything
[10114335272495801] Extract method extract_share_information didn't return anything
[10114335272495801] Extract method extract_listing didn't return anything
[10114335272495801] Extract method extract_with didn't return anything
[10114335272495801] Exception while extracting reactions: KeyError('reactors')
[10114319077261181] Extract method extract_photo_link didn't return anything
[10114319077261181] Extract method extract_video didn't return anything
[10114319077261181] Extract method extract_video_thumbnail didn't return anything
[10114319077261181] Extract method extract_video_id didn't return anything
[10114319077261181] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114319077261181] Extract method extract_factcheck didn't return anything
[10114319077261181] Extract method extract_share_information didn't return anything
[10114319077261181] Extract method extract_listing didn't return anything
[10114319077261181] Extract method extract_with didn't return anything
[10114319077261181] Exception while extracting reactions: KeyError('reactors')
[10114316913387601] Extract method extract_photo_link didn't return anything
[10114316913387601] Extract method extract_video didn't return anything
[10114316913387601] Extract method extract_video_thumbnail didn't return anything
[10114316913387601] Extract method extract_video_id didn't return anything
[10114316913387601] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114316913387601] Extract method extract_factcheck didn't return anything
[10114316913387601] Extract method extract_share_information didn't return anything
[10114316913387601] Extract method extract_listing didn't return anything
[10114316913387601] Extract method extract_with didn't return anything
[10114316913387601] Exception while extracting reactions: KeyError('reactors')
Looking for next page URL
Page parser did not find next page URL
Starting to iterate pages
Exception while requesting URL: https://m.facebook.com/zuck/posts/
Exception: HTTPError('404 Client Error: Not Found for url: https://m.facebook.com/zuck/posts/?locale=en_US')
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.7/site-packages/facebook_scraper/facebook_scraper.py", line 809, in get
response.raise_for_status()
File "/usr/local/anaconda3/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://m.facebook.com/zuck/posts/?locale=en_US
404 Client Error: Not Found for url: https://m.facebook.com/zuck/posts/?locale=en_US
Requesting page from: https://m.facebook.com/zuck/
Parsing page response
Got 10 raw posts from page
Extracting posts from page 0
[10114405231577241] Extract method extract_photo_link didn't return anything
[10114405231577241] Extract method extract_video didn't return anything
[10114405231577241] Extract method extract_video_thumbnail didn't return anything
[10114405231577241] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114405231577241] Extract method extract_factcheck didn't return anything
[10114405231577241] Extract method extract_share_information didn't return anything
[10114405231577241] Extract method extract_listing didn't return anything
[10114405231577241] Extract method extract_with didn't return anything
[10114394031816651] Extract method extract_photo_link didn't return anything
[10114394031816651] Extract method extract_video didn't return anything
[10114394031816651] Extract method extract_video_thumbnail didn't return anything
[10114394031816651] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114394031816651] Extract method extract_factcheck didn't return anything
[10114394031816651] Extract method extract_share_information didn't return anything
[10114394031816651] Extract method extract_listing didn't return anything
[10114394031816651] Extract method extract_with didn't return anything
[10114380064791681] Extract method extract_photo_link didn't return anything
[10114380064791681] Extract method extract_video didn't return anything
[10114380064791681] Extract method extract_video_thumbnail didn't return anything
[10114380064791681] Extract method extract_video_id didn't return anything
[10114380064791681] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114380064791681] Extract method extract_factcheck didn't return anything
[10114380064791681] Extract method extract_share_information didn't return anything
[10114380064791681] Extract method extract_listing didn't return anything
[10114380064791681] Extract method extract_with didn't return anything
[10114366512031521] Extract method extract_photo_link didn't return anything
[10114366512031521] Extract method extract_video didn't return anything
[10114366512031521] Extract method extract_video_thumbnail didn't return anything
[10114366512031521] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114366512031521] Extract method extract_factcheck didn't return anything
[10114366512031521] Extract method extract_share_information didn't return anything
[10114366512031521] Extract method extract_listing didn't return anything
[10114366512031521] Extract method extract_with didn't return anything
[10114340420204751] Extract method extract_photo_link didn't return anything
[10114340420204751] Extract method extract_video didn't return anything
[10114340420204751] Extract method extract_video_thumbnail didn't return anything
[10114340420204751] Extract method extract_video_id didn't return anything
[10114340420204751] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114340420204751] Extract method extract_factcheck didn't return anything
[10114340420204751] Extract method extract_share_information didn't return anything
[10114340420204751] Extract method extract_listing didn't return anything
[677947796579229] Extract method extract_photo_link didn't return anything
[677947796579229] Extract method extract_video didn't return anything
[677947796579229] Extract method extract_video_thumbnail didn't return anything
[677947796579229] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[677947796579229] Extract method extract_factcheck didn't return anything
[677947796579229] Extract method extract_share_information didn't return anything
[677947796579229] Extract method extract_listing didn't return anything
[677947796579229] Extract method extract_with didn't return anything
[10114335992213481] Extract method extract_photo_link didn't return anything
[10114335992213481] Extract method extract_video didn't return anything
[10114335992213481] Extract method extract_video_thumbnail didn't return anything
[10114335992213481] Extract method extract_video_id didn't return anything
[10114335992213481] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335992213481] Extract method extract_factcheck didn't return anything
[10114335992213481] Extract method extract_share_information didn't return anything
[10114335992213481] Extract method extract_listing didn't return anything
[10114335992213481] Extract method extract_with didn't return anything
[10114335272495801] Extract method extract_photo_link didn't return anything
[10114335272495801] Extract method extract_video didn't return anything
[10114335272495801] Extract method extract_video_thumbnail didn't return anything
[10114335272495801] Extract method extract_video_id didn't return anything
[10114335272495801] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114335272495801] Extract method extract_factcheck didn't return anything
[10114335272495801] Extract method extract_share_information didn't return anything
[10114335272495801] Extract method extract_listing didn't return anything
[10114335272495801] Extract method extract_with didn't return anything
[10114319077261181] Extract method extract_photo_link didn't return anything
[10114319077261181] Extract method extract_video didn't return anything
[10114319077261181] Extract method extract_video_thumbnail didn't return anything
[10114319077261181] Extract method extract_video_id didn't return anything
[10114319077261181] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114319077261181] Extract method extract_factcheck didn't return anything
[10114319077261181] Extract method extract_share_information didn't return anything
[10114319077261181] Extract method extract_listing didn't return anything
[10114319077261181] Extract method extract_with didn't return anything
[10114316913387601] Extract method extract_photo_link didn't return anything
[10114316913387601] Extract method extract_video didn't return anything
[10114316913387601] Extract method extract_video_thumbnail didn't return anything
[10114316913387601] Extract method extract_video_id didn't return anything
[10114316913387601] Exception while running extract_video_meta: AttributeError("'NoneType' object has no attribute 'find'")
[10114316913387601] Extract method extract_factcheck didn't return anything
[10114316913387601] Extract method extract_share_information didn't return anything
[10114316913387601] Extract method extract_listing didn't return anything
[10114316913387601] Extract method extract_with didn't return anything
Looking for next page URL
Page parser did not find next page URL
10
Hmm, after commenting out
# set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
I get 100 and everything seems to work. 😊
Thanks.
Oops, now I'm getting
/usr/local/anaconda3/lib/python3.7/site-packages/facebook_scraper/facebook_scraper.py:841: UserWarning: Facebook says 'Unsupported Browser'
warnings.warn(f"Facebook says 'Unsupported Browser'")
and no reactors ([]
). Now I remember that that's why I added the set_user_agent()
line originally. Is there a fix other than switching the user_agent dynamically? (Haven't tried that yet - will later when I have time.)
Is facebook-scraper
being specifically blocked by Facebook from grabbing reactors
?
I recently pushed a fix for that reactors
issue (https://github.com/kevinzg/facebook-scraper/issues/692, https://github.com/kevinzg/facebook-scraper/commit/c41e14e1c8271ae82d2e981d64bf8cd21db08a85), try update to the latest master branch
I don't think so, I think it's just that they updated their code, so now we have to update our code to follow
Thanks, it works great! I still got the error for post 5746385992055254 for some reason.
What error? reactor extraction for this post works fine for me
The error I meant was
/usr/local/anaconda3/lib/python3.7/site-packages/facebook_scraper/facebook_scraper.py:841: UserWarning: Facebook says 'Unsupported Browser'
warnings.warn(f"Facebook says 'Unsupported Browser'")
which I am still getting periodically, but after further investigation, that post doesn't seem to be the trigger as I had thought. I'm not sure what the trigger is.
During testing right now I am getting this
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.7/site-packages/facebook_scraper/utils.py", line 271, in safe_consume
for item in generator:
File "/usr/local/anaconda3/lib/python3.7/site-packages/facebook_scraper/extractors.py", line 706, in extract_reactors
f"div>i.{spriteMapCssClass}", first=True
AttributeError: 'NoneType' object has no attribute 'attrs'
error, though, even though the reactors are still being populated correctly. I'm not sure exactly what's going on. What info would be most helpful to you for either of these errors?
That's not an error, just a warning. Facebook still lets you access things even when it warns you about using an Unsupported Browser.
Could you please post the code you're using?