facebook-scraper
facebook-scraper copied to clipboard
Scraper returns incomplete information for get_page_info
The scraping works fine for some pages but for some, it retrieves less information such as the following:
This code snippet below (page: atebeyandsell
)
from facebook_scraper import *
from pprint import pprint
set_cookies("fb_cookie.txt")
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
pprint(get_page_info('atebeyandsell'))
Returns the following
{'about': 'Entrepreneur · Gaming Video Creator\n'
'Send message\n'
'ᴅɪsᴄᴏᴜɴᴛᴇᴅ ɢᴀᴍᴇ ᴄʀᴇᴅɪᴛs sᴇʟʟᴇʀ.\n'
'ᴛʀᴜsᴛᴇᴅ & ᴀʟᴡᴀʏs ʀᴇᴄᴏᴍᴍᴇɴᴅᴇᴅ.\n'
'ᴀʟʟ ᴛʀᴀɴsᴀᴄᴛɪᴏɴs ᴀʀᴇ sᴀғᴇ ᴀɴᴅ ʟᴇɢɪᴛ ✨\n'
'1 Video\n'
'[email protected]\n'
'http://instagram.com/atebeyofficial',
'likes': 6613,
'profile_photo': 'https://scontent.fmnl17-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164655260_107316611452835_8528683889612977791_n.jpg?_nc_cat=110&ccb=1-5&_nc_sid=ed5ff1&efg=eyJpIjoidCJ9&_nc_eui2=AeH1BAfJPhyOlrPCVM-i5RSMckwgHk9sgqFyTCAeT2yCoRsyBrnuYXXkf8OdF8DXgEHEC2SHH_Dx7Ks7cSHtfxxq&_nc_ohc=WOM3rj6xiC0AX8NlL45&_nc_ht=scontent.fmnl17-3.fna&oh=00_AT_TJvBiiaShuD4E74ffQ4HYWksfET86ScTYY9KPCvps7Q&oe=6232C265',
'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fcb2007b350>}
Meanwhile, this code snippet (page: panglaofooddelivery
)
from facebook_scraper import *
from pprint import pprint
set_cookies("fb_cookie.txt")
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
pprint(get_page_info('panglaofooddelivery'))
Returns the following
{'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fcb300bfc10>}
These two examples output the following for me:
Requesting page from: /atebeyandsell/about/
Requesting page from: /atebeyandsell/
Ate Bey and Sell. 6,614 likes · 400 talking about this. ᴅɪsᴄᴏᴜɴᴛᴇᴅ ɢᴀᴍᴇ ᴄʀᴇᴅɪᴛs sᴇʟʟᴇʀ.
ᴛʀᴜsᴛᴇᴅ & ᴀʟᴡᴀʏs ʀᴇᴄᴏᴍᴍᴇɴᴅᴇᴅ.
ᴀʟʟ ᴛʀᴀɴsᴀᴄᴛɪᴏɴs ᴀʀᴇ sᴀғᴇ ᴀɴᴅ ʟᴇɢɪᴛ ✨
{'about': 'About\n'
'http://instagram.com/atebeyofficial\n'
'Away\n'
'Send message\n'
'Entrepreneur · Gaming Video Creator\n'
'See all',
'address': None,
'cover_photo': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164655260_107316611452835_8528683889612977791_n.jpg?_nc_cat=110&ccb=1-5&_nc_sid=ed5ff1&efg=eyJpIjoidCJ9&_nc_ohc=WOM3rj6xiC0AX_ZwTlF&_nc_ht=scontent.fakl8-1.fna&oh=00_AT9Z9wLOTeyFvxmbE_0JrDjekb_XmjYyX5IiKPLB5DG-UQ&oe=6232C265',
'followers': 7961,
'identifier': 107313684786461,
'image': None,
'likes': 6614,
'name': 'Ate Bey and Sell',
'profile_photo': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t39.30808-6/260975528_262784969239331_4505857755717975013_n.png?stp=cp0_dst-png_p64x64&_nc_cat=105&ccb=1-5&_nc_sid=85a577&efg=eyJpIjoidCJ9&_nc_ohc=iOFpwUnlzGUAX-e0kKf&_nc_ht=scontent.fakl8-1.fna&oh=00_AT-GWPWJqWrfR10zH7YwLtGRGC_nWRfnPI9-AuqdyMMaMg&oe=62106AC8',
'rating': 'Entrepreneur',
'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fb299dc7a50>,
'sameAs': 'instagram.com/atebeyofficial',
'type': 'Person',
'url': 'https://www.facebook.com/atebeyandsell/'}
Requesting page from: /panglaofooddelivery/about/
Content Not Found
Requesting page from: /panglaofooddelivery/
Panglao FOOD Delivery, Panglao, Bohol. 211 likes · 22 talking about this. CLICK "Get Started" and "Order Now" button to Start Ordering
{'about': 'About\n'
'Suggest edits\n'
'6340 Panglao, Philippines\n'
'Get Directions\n'
'See Menu\n'
'Rating · 5\n'
'(3 reviews)\n'
'[email protected]\n'
'See what Panglao FOOD Delivery is doing in Messenger\n'
'Get Started\n'
'Closed now\n'
'·\n'
'7:00 AM - 8:00 PM\n'
'Closed now\n'
'·\n'
'7:00 AM - 8:00 PM\n'
'Wednesday\n'
'Thursday\n'
'Friday\n'
'Saturday\n'
'Sunday\n'
'Monday\n'
'Tuesday\n'
'7:00 AM - 8:00 PM\n'
'6:30 AM - 8:00 PM\n'
'10:30 AM - 8:00 PM\n'
'6:00 AM - 8:00 PM\n'
'6:30 AM - 8:00 PM\n'
'7:00 AM - 8:00 PM\n'
'7:00 AM - 8:00 PM\n'
'CLICK "Get Started" and "Order Now" button to Start Ordering\n'
'Offers free Wi-Fi\n'
'Food delivery service\n'
'See more\n'
'See Less',
'address': None,
'followers': 219,
'foundingDate': '2020-10-29T06:26:29-0700',
'identifier': 107561757820720,
'image': None,
'likes': 211,
'name': 'Panglao FOOD Delivery',
'rating': '5.0 (3)',
'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fb299dee430>,
'sameAs': '<<not-applicable>>',
'type': 'Organization',
'url': 'https://www.facebook.com/panglaofooddelivery/'}
There must be something wrong with your cookies. Perhaps you're facing temporary bans due to excessive scraping.
Oh. Alright. Is there any way to circumvent this problem? Would multiple cookies do or would adding sleep time between profile scraping help?
Probably, give it a try
Is there a way to identify if the cookie is already banned or something? So I can also estimate around how many posts/profiles would reach that limit. Also, would you know how long until the temporary ban lasts?
A key you need that is missing, should be a good smoke test. Usually around an hour or so.
I tried using multiple cookies, whenever I use a different account's cookie, the previous account's cookie that I used becomes invalid. Is there a workaround for this?
Clicking the "Log Out" button on Facebook invalidates those cookies. So if you're switching accounts by signing out of one account and signing into another, you're invaliding those cookies. A good workaround is to use incognito mode, and closing the browser to clear cookies without invalidating them.
username for reviews in get_page_info() sometimes returns page title instead of real user's name. links = elem.find("a") "username": links[0].text,
{'user_url': 'https://facebook.com/onedaycincinnati/?locale2=en_US', 'username': 'Greater Cincinnati Doors And Closets', 'profile_picture': 'https://scontent.fmcc1- {'user_url': 'https://facebook.com/morgan.hoehn?locale2=en_US', 'username': 'Morgan Hoehn', 'profile_picture': 'https://scontent.fmcc1-1 {'user_url': 'https://facebook.com/onedaycincinnati/?locale2=en_US', 'username': 'Greater Cincinnati Doors And Closets', 'profile_picture': 'https://scontent.fmcc1-1.fna.fbcdn.net/v/t39.30808- {'user_url': 'https://facebook.com/kim.a.swisher?locale2=en_US', 'username': 'Kim Alcini Swisher', 'profile_picture':
It looks like that issue occurs if you don't pass cookies. https://github.com/kevinzg/facebook-scraper/commit/1531ba91acca8ae6ddbfcffe8a16b70c2d191aab should fix it