fbcrawl
fbcrawl copied to clipboard
Blocked after crawling
Don't use your personal facebook profile to crawl
Hello, We're starting to experience some blockage by facebook. After a certain number of "next pages" have been visited the profile is temporarily suspended for about 1 hour.
If scrapy ends abruptly with this error, your account has been blocked:
File "/fbcrawl/fbcrawl/spiders/fbcrawl.py", line 170, in parse_page
if response.meta['flag'] == self.k and self.k >= self.year:
KeyError: 'flag'
This prevents you from visiting any page during the blocking period from mbasic.facebook.com, however, it seems that the blockage is not fully enforced on m.facebook.com and facebook.com you can still access the public pages but not private profiles!
If you are experiencing this issue, in settings.py set:
CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 1
This will force a sequential crawling and will also noticeably slow the crawler down but will assure a better final result. DOWNLOAD_DELAY should be increased if you're still experiencing blockage. More experiments need to be done to assess the situation, please report here your findings and suggestions
hey, add a time.sleep(1) before each "see more", worked fine for me
@ademjemaa thx for your suggestion! Probably a better way of accomplishing the same thing is to use the DOWNLOAD_DELAY parameter in settings.py. According to scrapy docs the delay time is randomized:
Scrapy doesn’t wait a fixed amount of time between requests, but uses a random interval between 0.5 * DOWNLOAD_DELAY and 1.5 * DOWNLOAD_DELAY.
Hi guys how to extract group member data?by scrapy
@maaudrana what kind of data are we talking ? just the list of a group's members or do you wanna have as much info on every person as possible ?
Yeah i need all the member id one a specific group,after getting the list of user it seems its easy to collect data from each user,,,,as you can see Facebook Extractor software do. tnx
how can i do that?
@rugantio you want a profile url for the members ? i can make a crawler like that real quick
oh it will really helpful.tnx
actually i want to get Name and email address from a facebook group.I want all the members info.
ill make a crawler that leads to the profile of each member and you do whatever you want with it
will really appreciate it.
@maaudrana done, check https://github.com/ademjemaa/fbcrawl
Wow really cool jumping to the code🥰 can we connect any social media?
Hey,
for some reason it does not crawl after the first page, do you see this issue as well? when trying to crawl groups
hey , please help me , i have the same problem , i tried to your way , but it still don't work , the issue as below :
Traceback (most recent call last):
File "c:\users\asus\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "c:\users\asus\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "c:\users\asus\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "c:\users\asus\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "c:\users\asus\anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in