fbcrawl icon indicating copy to clipboard operation
fbcrawl copied to clipboard

Blocked after crawling

Open rugantio opened this issue 5 years ago • 15 comments

Don't use your personal facebook profile to crawl

Hello, We're starting to experience some blockage by facebook. After a certain number of "next pages" have been visited the profile is temporarily suspended for about 1 hour.

If scrapy ends abruptly with this error, your account has been blocked:

  File "/fbcrawl/fbcrawl/spiders/fbcrawl.py", line 170, in parse_page
    if response.meta['flag'] == self.k and self.k >= self.year:
KeyError: 'flag'

This prevents you from visiting any page during the blocking period from mbasic.facebook.com, however, it seems that the blockage is not fully enforced on m.facebook.com and facebook.com you can still access the public pages but not private profiles!

Screenshot_20190425_163240

If you are experiencing this issue, in settings.py set:

CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 1

This will force a sequential crawling and will also noticeably slow the crawler down but will assure a better final result. DOWNLOAD_DELAY should be increased if you're still experiencing blockage. More experiments need to be done to assess the situation, please report here your findings and suggestions

rugantio avatar Apr 25 '19 12:04 rugantio

hey, add a time.sleep(1) before each "see more", worked fine for me

ademjemaa avatar Apr 25 '19 12:04 ademjemaa

@ademjemaa thx for your suggestion! Probably a better way of accomplishing the same thing is to use the DOWNLOAD_DELAY parameter in settings.py. According to scrapy docs the delay time is randomized:

Scrapy doesn’t wait a fixed amount of time between requests, but uses a random interval between 0.5 * DOWNLOAD_DELAY and 1.5 * DOWNLOAD_DELAY.

rugantio avatar Apr 25 '19 15:04 rugantio

Hi guys how to extract group member data?by scrapy

masudr4n4 avatar May 10 '19 04:05 masudr4n4

@maaudrana what kind of data are we talking ? just the list of a group's members or do you wanna have as much info on every person as possible ?

ademjemaa avatar May 10 '19 04:05 ademjemaa

Yeah i need all the member id one a specific group,after getting the list of user it seems its easy to collect data from each user,,,,as you can see Facebook Extractor software do. tnx

masudr4n4 avatar May 10 '19 04:05 masudr4n4

how can i do that?

masudr4n4 avatar May 10 '19 04:05 masudr4n4

@rugantio you want a profile url for the members ? i can make a crawler like that real quick

ademjemaa avatar May 10 '19 04:05 ademjemaa

oh it will really helpful.tnx

masudr4n4 avatar May 10 '19 04:05 masudr4n4

actually i want to get Name and email address from a facebook group.I want all the members info.

masudr4n4 avatar May 10 '19 04:05 masudr4n4

ill make a crawler that leads to the profile of each member and you do whatever you want with it

ademjemaa avatar May 10 '19 04:05 ademjemaa

will really appreciate it.

masudr4n4 avatar May 10 '19 04:05 masudr4n4

@maaudrana done, check https://github.com/ademjemaa/fbcrawl

ademjemaa avatar May 10 '19 05:05 ademjemaa

Wow really cool jumping to the code🥰 can we connect any social media?

masudr4n4 avatar May 10 '19 08:05 masudr4n4

Hey,

for some reason it does not crawl after the first page, do you see this issue as well? when trying to crawl groups

tamirpassi avatar May 23 '19 21:05 tamirpassi

hey , please help me , i have the same problem , i tried to your way , but it still don't work , the issue as below :

Traceback (most recent call last): File "c:\users\asus\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback yield next(it) File "c:\users\asus\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "c:\users\asus\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output for x in result: File "c:\users\asus\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "c:\users\asus\anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in return (_set_referer(r) for r in result or ()) File "c:\users\asus\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "c:\users\asus\anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in return (r for r in result or () if _filter(r)) File "c:\users\asus\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "c:\users\asus\anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in return (r for r in result or () if _filter(r)) File "C:\Users\ASUS\Downloads\fbcrawl-master\fbcrawl-master\fbcrawl\spiders\comments.py", line 84, in parse_page if response.meta['flag'] == self.k and self.k >= self.year: KeyError: 'flag'

cuongtop4598 avatar Sep 17 '19 13:09 cuongtop4598