facebook-scraper
facebook-scraper copied to clipboard
Extracting Page Review
Does the library support extracting reviews for pages? Even just the overall review for the page? It doesn't show up when using get_page_info
.
It does:
pprint(get_page_info("SkyTowerAKL"))
outputs:
{'about': 'About\n'
'Corner Victoria and Federal Streets, Auckland, New Zealand 1010\n'
'Get Directions\n'
'Rating · 4.5\n'
'(2.6K reviews)\n'
'304,574 people checked in here\n'
'09-363 6000\n'
'[email protected]\n'
'http://www.skycityauckland.co.nz/Attractions/Skytower.html\n'
'Closed now\n'
'·\n'
'10:00 AM - 6:00 PM\n'
'Closed now\n'
'·\n'
'10:00 AM - 6:00 PM\n'
'Wednesday\n'
'Thursday\n'
'Friday\n'
'Saturday\n'
'Sunday\n'
'Monday\n'
'Tuesday\n'
'10:00 AM - 6:00 PM\n'
'10:00 AM - 6:00 PM\n'
'10:00 AM - 6:00 PM\n'
'10:00 AM - 6:00 PM\n'
'10:00 AM - 6:00 PM\n'
'10:00 AM - 6:00 PM\n'
'10:00 AM - 6:00 PM\n'
'Popular Hours\n'
'MON\n'
'TUE\n'
'WED\n'
'THU\n'
'FRI\n'
'SAT\n'
'SUN\n'
'MON\n'
'TUE\n'
'WED\n'
'THU\n'
'FRI\n'
'SAT\n'
'SUN\n'
'MON\n'
'TUE\n'
'WED\n'
'THU\n'
'FRI\n'
'SAT\n'
'SUN\n'
'MON\n'
'TUE\n'
'WED\n'
'THU\n'
'FRI\n'
'SAT\n'
'SUN\n'
'MON\n'
'TUE\n'
'WED\n'
'THU\n'
'FRI\n'
'SAT\n'
'SUN\n'
'MON\n'
'TUE\n'
'WED\n'
'THU\n'
'FRI\n'
'SAT\n'
'SUN\n'
'MON\n'
'TUE\n'
'WED\n'
'THU\n'
'FRI\n'
'SAT\n'
'SUN\n'
'9:00\n'
'11:00\n'
'13:00\n'
'15:00\n'
'17:00\n'
'19:00\n'
'21:00\n'
'23:00\n'
'9:00\n'
'11:00\n'
'13:00\n'
'15:00\n'
'17:00\n'
'19:00\n'
'21:00\n'
'23:00\n'
'9:00\n'
'11:00\n'
'13:00\n'
'15:00\n'
'17:00\n'
'19:00\n'
'21:00\n'
'23:00\n'
'9:00\n'
'11:00\n'
'13:00\n'
'15:00\n'
'17:00\n'
'19:00\n'
'21:00\n'
'23:00\n'
'9:00\n'
'11:00\n'
'13:00\n'
'15:00\n'
'17:00\n'
'19:00\n'
'21:00\n'
'23:00\n'
'9:00\n'
'11:00\n'
'13:00\n'
'15:00\n'
'17:00\n'
'19:00\n'
'21:00\n'
'23:00\n'
'9:00\n'
'11:00\n'
'13:00\n'
'15:00\n'
'17:00\n'
'19:00\n'
'21:00\n'
'23:00\n'
"One of New Zealand's most exhilarating and spectacular tourist "
'attractions\n'
"A truly captivating experience awaits visitors to Auckland's Sky "
'Tower. At 328 metres, it is the tallest man-made structure in New '
'Zealand and offers breathtaking views for up to 80 kilometres in '
'every direction.\n'
'\n'
'Travel up in the glass-fronted lifts to one of the three '
'spectacular viewing platforms, or for more thrills and excitement, '
'SkyWalk round the pergola at 192 metres up or SkyJump off the '
'Tower!\n'
'\n'
'Relax with a coffee and light refreshments at Sky Lounge or dine at '
"Orbit - Auckland's only 360-degree revolving restaurant.\n"
'\n'
"Sky Tower is one of New Zealand's most exhilarating and spectacular "
'tourist attractions, you will be amazed at what you can see and do '
'under one roof!\n'
'Price Range · $$\n'
'Landmark & Historical Place\n'
'·\n'
'Restaurant\n'
'See more\n'
'See Less',
'checkins': 304574,
'likes': 68922,
'people_talking_about_this': 612}
Note the 'Rating · 4.5\n'
'(2.6K reviews)\n'
In the about
field
Oh cool! Thanks for this. Although I think it doesn't handle some cases such as this one. Here's a Facebook Page with 3 reviews but they are not seen in the About
But the resulting About looks like this
About\n
Suggest edits\n
1121 B Labores Street Pandacan, 1011 Manila, Philippines\n
Get Directions\n
84 people checked in here\n
0998 963 3587\n
Send message\n
Open now\n
·\n
9 AM - 9:30 PM\n
Open now\n
·\n
9 AM - 9:30 PM\n
Monday\n
Tuesday\n
Wednesday\n
Thursday\n
Friday\n
Saturday\n
Sunday\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
Fresh, delicious, yummy, refreshing and affordable shake and juices only from Chamba Juice and Shake!!!\n
Price Range · $\n
Smoothie & Juice Bar\n
Products\n
smoothies, milktea, juies\n
See more\n
See Less
Would a separate feature be needed for extracting the Reviews Tab?
I see - looks like chambajuice doesn't have an about page. This commit (https://github.com/kevinzg/facebook-scraper/commit/a516dfabff4b5937ef99ea25c84e463473a29e3d) should make get_page_info
extract the rating, under a new key called rating
. No need to raise a separate issue for the feature of extracting reviews, we can re-use this one
This commit (https://github.com/kevinzg/facebook-scraper/commit/e362c522dd500c3c91ffb858c6044fca3d4b4d9a) should make it possible to extract reviews. Sample usage:
for review in get_page_info("chambajuice")["reviews"]:
pprint(review)
outputs:
{'post_url': 'https://facebook.com/story.php?story_fbid=844190382691206&id=100013007553035&locale2=en_US&__tn__=%2As%2As',
'profile_picture': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t1.6435-1/cp0/e15/q65/p40x40/176057649_1213006852476222_7349829092521007297_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=mQwiEkVN55cAX-txl-s&_nc_ht=scontent.fakl8-1.fna&oh=00_AT_dqKC3Yhu2jYV9Pf4HJhJmn0yjOMoobEoajX5k4rpWfg&oe=6216E02B',
'recommends': True,
'text': 'good taste ang milktea. creamy',
'time': datetime.datetime(2019, 12, 31, 7, 31, 42),
'timestamp': 1577730702,
'user_url': 'https://facebook.com/app.bennok?locale2=en_US',
'username': 'Boy Montaos'}
{'post_url': 'https://facebook.com/story.php?story_fbid=4077043325658195&id=100000577028543&locale2=en_US&__tn__=%2As%2As',
'profile_picture': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t39.30808-1/cp0/e15/q65/p40x40/252319237_5083768481652336_441345146184154296_n.jpg?_nc_cat=103&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=tr5R8QAt6-QAX-fEtyM&_nc_ht=scontent.fakl8-1.fna&oh=00_AT-P6GENcwW8sFQv1v1rnFmUbCeZYPwfUx0zXi9sK7bNiQ&oe=61F3BBC3',
'recommends': True,
'text': 'Super Affordable and yummy. napaka bilis pa nang service and '
'delivery. 😊👍',
'time': datetime.datetime(2020, 12, 9, 2, 27, 39),
'timestamp': 1607434059,
'user_url': 'https://facebook.com/hannahniah.lim?locale2=en_US',
'username': 'Hananiah Fermin Lim'}
{'post_url': 'https://facebook.com/story.php?story_fbid=3178356272178102&id=100000112820909&locale2=en_US&__tn__=%2As%2As',
'profile_picture': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t39.30808-1/cp0/e15/q65/p40x40/218824795_6398471983499832_3335334123648518092_n.jpg?_nc_cat=104&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=sJ5901n2oWwAX8qVkQO&_nc_ht=scontent.fakl8-1.fna&oh=00_AT_VL2g9-9jqNkGWNfJOZJnxX9ejqgBsGMrFNdZ-wI_yCg&oe=61F38F86',
'recommends': True,
'text': 'ok naman. patamisin lang ng konti yung pearl 😊',
'time': datetime.datetime(2019, 6, 4, 0, 53, 25),
'timestamp': 1559566405,
'user_url': 'https://facebook.com/yzhanyzhi?locale2=en_US',
'username': 'Yazmine C J Bautista'}
This page profile has review generator object but it throws out Content Not Found
error.
Below is the code when scraping the page using get_page_info
from facebook_scraper import *
from pprint import pprint
set_cookies("fb_cookie.txt")
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
profile = get_page_info('atebeyandsell')
pprint(profile)
Below here is the output
{'about': 'About\n'
'[http://instagram.com/atebeyofficial\n](http://instagram.com/atebeyofficial/n)'
'Send message\n'
'Entrepreneur · Gaming Video Creator\n'
'See all',
'address': None,
'followers': 7964,
'identifier': 107313684786461,
'image': None,
'likes': 6616,
'name': 'Ate Bey and Sell',
'profile_photo': 'https://scontent.fmnl4-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164655260_107316611452835_8528683889612977791_n.jpg?_nc_cat=110&ccb=1-5&_nc_sid=ed5ff1&efg=eyJpIjoidCJ9&_nc_eui2=AeH1BAfJPhyOlrPCVM-i5RSMckwgHk9sgqFyTCAeT2yCoRsyBrnuYXXkf8OdF8DXgEHEC2SHH_Dx7Ks7cSHtfxxq&_nc_ohc=WOM3rj6xiC0AX9Ct3Y1&_nc_ht=scontent.fmnl4-3.fna&oh=00_AT9WN2G_WwPbs8fQFtp1ho1EK9kQdfE9EK6q-WaB5ixntQ&oe=6232C265',
'rating': 'Entrepreneur',
'reviews': <generator object FacebookScraper.get_page_reviews at 0x7f9cb1346430>,
'sameAs': 'instagram.com/atebeyofficial',
'type': 'Person',
'url': 'https://www.facebook.com/atebeyandsell/'}
From the output, it can be seen that there is a generator object for the reviews
key. However, when trying to access it using the code below
for i in profile['reviews']:
print(i)
It throws the following error
NotFound Traceback (most recent call last)
/var/folders/25/k79djfcj737dwxvhtkr192zr8p8x5p/T/ipykernel_15908/4199099289.py in <module>
----> 1 for i in profile['reviews']:
2 print(i)
~/opt/anaconda3/envs/sample/lib/python3.8/site-packages/facebook_scraper/facebook_scraper.py in get_page_reviews(self, page, **kwargs)
521 while more_url:
522 logger.debug(f"Fetching {more_url}")
--> 523 response = self.get(more_url)
524 if response.text.startswith("for (;;);"):
525 prefix_length = len('for (;;);')
~/opt/anaconda3/envs/sample/lib/python3.8/site-packages/facebook_scraper/facebook_scraper.py in get(self, url, **kwargs)
805 if title:
806 if title.text.lower() in not_found_titles:
--> 807 raise exceptions.NotFound(title.text)
808 elif title.text.lower() == "error":
809 raise exceptions.UnexpectedResponse("Your request couldn't be processed")
NotFound: Content Not Found
The reviews aren't accessible at https://m.facebook.com/pg/atebeyandsell/reviews/ either. This page must have reviews disabled.
The problem might be with m.facebook.com The mobile version cannot open some urls
The reviews aren't accessible at https://www.facebook.com/atebeyandsell/reviews either