facebook-scraper icon indicating copy to clipboard operation
facebook-scraper copied to clipboard

Scraper returns incomplete information for get_page_info

Open suarezjessie opened this issue 3 years ago • 9 comments

The scraping works fine for some pages but for some, it retrieves less information such as the following:

This code snippet below (page: atebeyandsell)

from facebook_scraper import *
from pprint import pprint

set_cookies("fb_cookie.txt")
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
pprint(get_page_info('atebeyandsell'))

Returns the following

{'about': 'Entrepreneur · Gaming Video Creator\n'
          'Send message\n'
          'ᴅɪsᴄᴏᴜɴᴛᴇᴅ ɢᴀᴍᴇ ᴄʀᴇᴅɪᴛs sᴇʟʟᴇʀ.\n'
          'ᴛʀᴜsᴛᴇᴅ & ᴀʟᴡᴀʏs ʀᴇᴄᴏᴍᴍᴇɴᴅᴇᴅ.\n'
          'ᴀʟʟ ᴛʀᴀɴsᴀᴄᴛɪᴏɴs ᴀʀᴇ sᴀғᴇ ᴀɴᴅ ʟᴇɢɪᴛ ✨\n'
          '1 Video\n'
          '[email protected]\n'
          'http://instagram.com/atebeyofficial',
 'likes': 6613,
 'profile_photo': 'https://scontent.fmnl17-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164655260_107316611452835_8528683889612977791_n.jpg?_nc_cat=110&ccb=1-5&_nc_sid=ed5ff1&efg=eyJpIjoidCJ9&_nc_eui2=AeH1BAfJPhyOlrPCVM-i5RSMckwgHk9sgqFyTCAeT2yCoRsyBrnuYXXkf8OdF8DXgEHEC2SHH_Dx7Ks7cSHtfxxq&_nc_ohc=WOM3rj6xiC0AX8NlL45&_nc_ht=scontent.fmnl17-3.fna&oh=00_AT_TJvBiiaShuD4E74ffQ4HYWksfET86ScTYY9KPCvps7Q&oe=6232C265',
 'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fcb2007b350>}

Meanwhile, this code snippet (page: panglaofooddelivery)

from facebook_scraper import *
from pprint import pprint

set_cookies("fb_cookie.txt")
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
pprint(get_page_info('panglaofooddelivery'))

Returns the following

{'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fcb300bfc10>}

suarezjessie avatar Feb 15 '22 15:02 suarezjessie

These two examples output the following for me:

Requesting page from: /atebeyandsell/about/
Requesting page from: /atebeyandsell/
Ate Bey and Sell. 6,614 likes · 400 talking about this. ᴅɪsᴄᴏᴜɴᴛᴇᴅ ɢᴀᴍᴇ ᴄʀᴇᴅɪᴛs sᴇʟʟᴇʀ.
ᴛʀᴜsᴛᴇᴅ & ᴀʟᴡᴀʏs ʀᴇᴄᴏᴍᴍᴇɴᴅᴇᴅ.
ᴀʟʟ ᴛʀᴀɴsᴀᴄᴛɪᴏɴs ᴀʀᴇ sᴀғᴇ ᴀɴᴅ ʟᴇɢɪᴛ ✨
{'about': 'About\n'
          'http://instagram.com/atebeyofficial\n'
          'Away\n'
          'Send message\n'
          'Entrepreneur · Gaming Video Creator\n'
          'See all',
 'address': None,
 'cover_photo': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164655260_107316611452835_8528683889612977791_n.jpg?_nc_cat=110&ccb=1-5&_nc_sid=ed5ff1&efg=eyJpIjoidCJ9&_nc_ohc=WOM3rj6xiC0AX_ZwTlF&_nc_ht=scontent.fakl8-1.fna&oh=00_AT9Z9wLOTeyFvxmbE_0JrDjekb_XmjYyX5IiKPLB5DG-UQ&oe=6232C265',
 'followers': 7961,
 'identifier': 107313684786461,
 'image': None,
 'likes': 6614,
 'name': 'Ate Bey and Sell',
 'profile_photo': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t39.30808-6/260975528_262784969239331_4505857755717975013_n.png?stp=cp0_dst-png_p64x64&_nc_cat=105&ccb=1-5&_nc_sid=85a577&efg=eyJpIjoidCJ9&_nc_ohc=iOFpwUnlzGUAX-e0kKf&_nc_ht=scontent.fakl8-1.fna&oh=00_AT-GWPWJqWrfR10zH7YwLtGRGC_nWRfnPI9-AuqdyMMaMg&oe=62106AC8',
 'rating': 'Entrepreneur',
 'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fb299dc7a50>,
 'sameAs': 'instagram.com/atebeyofficial',
 'type': 'Person',
 'url': 'https://www.facebook.com/atebeyandsell/'}
Requesting page from: /panglaofooddelivery/about/
Content Not Found
Requesting page from: /panglaofooddelivery/
Panglao FOOD Delivery, Panglao, Bohol. 211 likes · 22 talking about this. CLICK "Get Started" and "Order Now" button to Start Ordering
{'about': 'About\n'
          'Suggest edits\n'
          '6340 Panglao, Philippines\n'
          'Get Directions\n'
          'See Menu\n'
          'Rating · 5\n'
          '(3 reviews)\n'
          '[email protected]\n'
          'See what Panglao FOOD Delivery is doing in Messenger\n'
          'Get Started\n'
          'Closed now\n'
          '·\n'
          '7:00 AM - 8:00 PM\n'
          'Closed now\n'
          '·\n'
          '7:00 AM - 8:00 PM\n'
          'Wednesday\n'
          'Thursday\n'
          'Friday\n'
          'Saturday\n'
          'Sunday\n'
          'Monday\n'
          'Tuesday\n'
          '7:00 AM - 8:00 PM\n'
          '6:30 AM - 8:00 PM\n'
          '10:30 AM - 8:00 PM\n'
          '6:00 AM - 8:00 PM\n'
          '6:30 AM - 8:00 PM\n'
          '7:00 AM - 8:00 PM\n'
          '7:00 AM - 8:00 PM\n'
          'CLICK "Get Started" and "Order Now" button to Start Ordering\n'
          'Offers free Wi-Fi\n'
          'Food delivery service\n'
          'See more\n'
          'See Less',
 'address': None,
 'followers': 219,
 'foundingDate': '2020-10-29T06:26:29-0700',
 'identifier': 107561757820720,
 'image': None,
 'likes': 211,
 'name': 'Panglao FOOD Delivery',
 'rating': '5.0 (3)',
 'reviews': <generator object FacebookScraper.get_page_reviews at 0x7fb299dee430>,
 'sameAs': '<<not-applicable>>',
 'type': 'Organization',
 'url': 'https://www.facebook.com/panglaofooddelivery/'}

There must be something wrong with your cookies. Perhaps you're facing temporary bans due to excessive scraping.

neon-ninja avatar Feb 15 '22 21:02 neon-ninja

Oh. Alright. Is there any way to circumvent this problem? Would multiple cookies do or would adding sleep time between profile scraping help?

suarezjessie avatar Feb 16 '22 01:02 suarezjessie

Probably, give it a try

neon-ninja avatar Feb 16 '22 01:02 neon-ninja

Is there a way to identify if the cookie is already banned or something? So I can also estimate around how many posts/profiles would reach that limit. Also, would you know how long until the temporary ban lasts?

suarezjessie avatar Feb 16 '22 02:02 suarezjessie

A key you need that is missing, should be a good smoke test. Usually around an hour or so.

neon-ninja avatar Feb 16 '22 03:02 neon-ninja

I tried using multiple cookies, whenever I use a different account's cookie, the previous account's cookie that I used becomes invalid. Is there a workaround for this?

suarezjessie avatar Feb 17 '22 13:02 suarezjessie

Clicking the "Log Out" button on Facebook invalidates those cookies. So if you're switching accounts by signing out of one account and signing into another, you're invaliding those cookies. A good workaround is to use incognito mode, and closing the browser to clear cookies without invalidating them.

neon-ninja avatar Feb 17 '22 20:02 neon-ninja

username for reviews in get_page_info() sometimes returns page title instead of real user's name. links = elem.find("a") "username": links[0].text,

{'user_url': 'https://facebook.com/onedaycincinnati/?locale2=en_US', 'username': 'Greater Cincinnati Doors And Closets', 'profile_picture': 'https://scontent.fmcc1- {'user_url': 'https://facebook.com/morgan.hoehn?locale2=en_US', 'username': 'Morgan Hoehn', 'profile_picture': 'https://scontent.fmcc1-1 {'user_url': 'https://facebook.com/onedaycincinnati/?locale2=en_US', 'username': 'Greater Cincinnati Doors And Closets', 'profile_picture': 'https://scontent.fmcc1-1.fna.fbcdn.net/v/t39.30808- {'user_url': 'https://facebook.com/kim.a.swisher?locale2=en_US', 'username': 'Kim Alcini Swisher', 'profile_picture':

aminrabinia avatar Mar 02 '22 17:03 aminrabinia

It looks like that issue occurs if you don't pass cookies. https://github.com/kevinzg/facebook-scraper/commit/1531ba91acca8ae6ddbfcffe8a16b70c2d191aab should fix it

neon-ninja avatar Mar 29 '22 23:03 neon-ninja