facebook-scraper icon indicating copy to clipboard operation
facebook-scraper copied to clipboard

feature suggestion re: number of requests made

Open curiousier-george opened this issue 2 years ago • 8 comments

Would it be possible/easy for facebook_scraper to internally keep track of the number of requests that it makes to Facebook in such a way that users could query the current count?

In my own application code, I try to keep track of the number of queries. The main problem with this is just whether I actually understand what's going on underneath the hood properly and then calculate everything correctly.

curiousier-george avatar Jul 28 '22 03:07 curiousier-george

Sure, https://github.com/kevinzg/facebook-scraper/commit/ba26b7bf3a1a61dfd60246b64a37f5c5a2a492ae should solve this. Usage:

from facebook_scraper import _scraper
from facebook_scraper import *
set_cookies("cookies.txt")
posts = get_posts("dudukovich", pages=10, options={'allow_extra_requests': False})
for post in posts:
    print(post["post_id"], post["likes"], post["reaction_count"], post["comments"])
print(f"Made {_scraper.request_count} requests")

neon-ninja avatar Jul 28 '22 04:07 neon-ninja

This is fantastic. Thank you!

Using it, I've discovered that I have not been calculating requests correctly. I thought that

from facebook_scraper import _scraper
from facebook_scraper import *
set_cookies("cookies.txt")
posts = get_posts("dudukovich", pages=1, options={'allow_extra_requests': False})
for post in posts:
    print(post["post_id"], post["likes"], post["reaction_count"], post["comments"])
print(f"Made {_scraper.request_count} requests")

(note that pages=1) would only make two requests - one for the 'login' and one for getting/accessing the posts from one page of a profile, but the actual output is Made 3 requests.

What are the three requests for?

curiousier-george avatar Jul 28 '22 12:07 curiousier-george

The scraper tried to get /posts, failed, and fell back to /

neon-ninja avatar Jul 28 '22 14:07 neon-ninja

Oh, in other words it first tries /posts/###, fails, and then tries /###? Is there a way I can avoid the first failure (and hence extra request)?

I now know that that code always takes at least two requests for me for any username.

curiousier-george avatar Jul 28 '22 14:07 curiousier-george

Sure, https://github.com/kevinzg/facebook-scraper/commit/089688dbe6405683aa5f9c0c87cc0de04de47d55 will stop the scraper from trying /posts

neon-ninja avatar Jul 28 '22 21:07 neon-ninja

Wow, that cuts down my requests by almost half. Thanks! (Do you happen to have any idea whether this kind of reduction in requests matters to Facebook with respect to soft bans?)

Many thanks for the _scraper.request_count addition. This makes it so much easier to understand exactly what triggers new requests.

My own request calculations are much closer now but still off sometimes, and I'm trying to track down under what conditions my calculations diverge from reality. Are there other failed-request fallbacks when fetching profiles, posts, or responses?

curiousier-george avatar Jul 29 '22 00:07 curiousier-george

Probably not. No problem. You can use enable_logging() to see what requests are being made and when

neon-ninja avatar Jul 29 '22 00:07 neon-ninja

You can use enable_logging() to see what requests are being made and when

Thanks, I should have realized that but didn't! 😊

curiousier-george avatar Jul 29 '22 00:07 curiousier-george