facebook-scraper icon indicating copy to clipboard operation
facebook-scraper copied to clipboard

TemporarilyBanned exception not being caught

Open TowardMyth opened this issue 4 years ago • 10 comments

I am scraping some FB pages.

FB temporarily bans you if you scrape too fast, and facebook-scraper will throw a TemporarilyBanned exception, per here.

However, for some reason I'm unable to catch the TemporarilyBanned exception. The code below will continue executing - and not go to the Except block - even once TemporarilyBanned Exception is raised.

The code below is inspired from @neon-ninja 's examples here.

How can I catch this exception, so that my scraper can wait for ~30+ mins before rescraping? Thanks!

from facebook_scraper import *
import json, youtube_dl, time, facebook_scraper, logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

file_handler = logging.FileHandler('fb_debug.txt')
file_handler.setLevel(logging.DEBUG)
logger.addHandler(file_handler)
file_handler.setFormatter(formatter)

stream_handler = logging.StreamHandler()
logger.addHandler(stream_handler)
stream_handler.setFormatter(formatter)

# =======================
# Change variables here
user = 'Nintendo'
counter=1
start_url = ''

options_dict = {
  "posts_per_page": 200
}

# =======================
# Scrape Facebook for posts
temporary_banned_count = 0

while True:
  try:

    for post in get_posts(user, pages=None, cookies='cookies.json', extra_info=True,youtube_dl=True, options=options_dict, start_url=start_url):
      counter += 1
      logger.info(f'Pulling post #{counter}...')

      try:
        logger.info(f'Post #{counter} date: {post["time"].strftime("%Y-%m-%d %H:%M")}')
        
      except AttributeError as e:
        logger.info(f'Post #{counter} does not have a date!')

      # Write as json object to .txt
      with open('fb_post.txt', 'a') as f:
        f.write(json.dumps(post, indent=4, sort_keys=True, default=str))
        f.write('\n')
        temporary_banned_count = 0
    logger.info("Done scraping all posts")
    break

  except exceptions.TemporarilyBanned as e:
    temporary_banned_count += 1
    sleep_secs = 600 * temporary_banned_count
    logger.info(f"Temporarily banned, sleeping for {sleep_secs / 60} m ({sleep_secs} secs)")
    time.sleep(sleep_secs)

TowardMyth avatar Jul 07 '21 20:07 TowardMyth

How do you know you're TemporarilyBanned if the code continues executing?

neon-ninja avatar Jul 07 '21 21:07 neon-ninja

@neon-ninja The logger for facebook_scraper returns something like this:

2021-07-07 02:23:31,878 - facebook_scraper.extractors - ERROR - You’re Temporarily Blocked.

As well, I added a print statement immediately before this line. This line gets printed to the console.

TowardMyth avatar Jul 07 '21 21:07 TowardMyth

So one of the extract functions handles the error, but you're still able to fetch additional posts despite that? An individual extract function is considered non-critical, so handles exceptions gracefully. It's only if pagination threw an exception that it would be raised to your code.

neon-ninja avatar Jul 07 '21 22:07 neon-ninja

@neon-ninja Is there any way to throw an exception even on individual extract functions? Or put another way, is there a way so that whenever I run into a "temporarilybanned" exception (whether it's on an individual extract function or pagination), my script can pause for 10+ mins before restarting?

TowardMyth avatar Jul 07 '21 22:07 TowardMyth

Sure - try this https://github.com/kevinzg/facebook-scraper/commit/53c89a195e510874f7418171e8f423a6afa7b958

neon-ninja avatar Jul 07 '21 23:07 neon-ninja

@neon-ninja Thanks, your commit worked! A few more small questions:

  1. My use case: I'm trying to collect ALL the FB posts for a particular user for archival purposes. So it's not acceptable to skip any posts. That is why I want to throw the "TemporarilyBanned" exception regardless of whether it's on individual extract function or pagination. Are there any other Exceptions/conditions/etc in your library that would inadvertently skip over scraping some posts / prevent me from collecting all posts?
  2. Do you have any tip to avoid getting temporarily banned by FB, ex: having some time.sleep() functions in between calls, etc?

TowardMyth avatar Jul 07 '21 23:07 TowardMyth

  1. Note that individual extract functions only extract parts of posts - for example, extracting high quality images in an image post. The post would still have been returned, just potentially missing high quality images. I'm not aware of any current bugs that would cause a post to be skipped.
  2. In general, the fewer requests you make, or the slower you make them, the less likely you are to be temp banned

neon-ninja avatar Jul 08 '21 00:07 neon-ninja

Have you been able to extract all the posts from a page with O(1k), O(10k), O(100k) posts before?

TowardMyth avatar Jul 08 '21 02:07 TowardMyth

Yes, I've done several CSV exports in the order of 1-5K per account. In https://github.com/kevinzg/facebook-scraper/issues/285, I extracted 14201 posts in 910s.

neon-ninja avatar Jul 08 '21 03:07 neon-ninja

How do you know you're TemporarilyBanned if the code continues executing?

Hi Neon -ninja,

I m trying to scrap user posts for around 10 k users but facebook is temporarily blocking my account .Can you please suggest what can be the idea way to handle this.

woodayssolutions avatar Nov 08 '23 17:11 woodayssolutions