facebook-scraper icon indicating copy to clipboard operation
facebook-scraper copied to clipboard

Extracting Page Review

Open suarezjessie opened this issue 3 years ago • 8 comments

Does the library support extracting reviews for pages? Even just the overall review for the page? It doesn't show up when using get_page_info.

suarezjessie avatar Jan 11 '22 03:01 suarezjessie

It does:

pprint(get_page_info("SkyTowerAKL"))

outputs:

{'about': 'About\n'
          'Corner Victoria and Federal Streets, Auckland, New Zealand 1010\n'
          'Get Directions\n'
          'Rating · 4.5\n'
          '(2.6K reviews)\n'
          '304,574 people checked in here\n'
          '09-363 6000\n'
          '[email protected]\n'
          'http://www.skycityauckland.co.nz/Attractions/Skytower.html\n'
          'Closed now\n'
          '·\n'
          '10:00 AM - 6:00 PM\n'
          'Closed now\n'
          '·\n'
          '10:00 AM - 6:00 PM\n'
          'Wednesday\n'
          'Thursday\n'
          'Friday\n'
          'Saturday\n'
          'Sunday\n'
          'Monday\n'
          'Tuesday\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          '10:00 AM - 6:00 PM\n'
          'Popular Hours\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          'MON\n'
          'TUE\n'
          'WED\n'
          'THU\n'
          'FRI\n'
          'SAT\n'
          'SUN\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          '9:00\n'
          '11:00\n'
          '13:00\n'
          '15:00\n'
          '17:00\n'
          '19:00\n'
          '21:00\n'
          '23:00\n'
          "One of New Zealand's most exhilarating and spectacular tourist "
          'attractions\n'
          "A truly captivating experience awaits visitors to Auckland's Sky "
          'Tower. At 328 metres, it is the tallest man-made structure in New '
          'Zealand and offers breathtaking views for up to 80 kilometres in '
          'every direction.\n'
          '\n'
          'Travel up in the glass-fronted lifts to one of the three '
          'spectacular viewing platforms, or for more thrills and excitement, '
          'SkyWalk round the pergola at 192 metres up or SkyJump off the '
          'Tower!\n'
          '\n'
          'Relax with a coffee and light refreshments at Sky Lounge or dine at '
          "Orbit - Auckland's only 360-degree revolving restaurant.\n"
          '\n'
          "Sky Tower is one of New Zealand's most exhilarating and spectacular "
          'tourist attractions, you will be amazed at what you can see and do '
          'under one roof!\n'
          'Price Range · $$\n'
          'Landmark & Historical Place\n'
          '·\n'
          'Restaurant\n'
          'See more\n'
          'See Less',
 'checkins': 304574,
 'likes': 68922,
 'people_talking_about_this': 612}

Note the 'Rating · 4.5\n' '(2.6K reviews)\n' In the about field

neon-ninja avatar Jan 11 '22 17:01 neon-ninja

Oh cool! Thanks for this. Although I think it doesn't handle some cases such as this one. Here's a Facebook Page with 3 reviews but they are not seen in the About

image

But the resulting About looks like this

About\n
Suggest edits\n
1121 B Labores Street Pandacan, 1011 Manila, Philippines\n
Get Directions\n
84 people checked in here\n
0998 963 3587\n
Send message\n
Open now\n
·\n
9 AM - 9:30 PM\n
Open now\n
·\n
9 AM - 9:30 PM\n
Monday\n
Tuesday\n
Wednesday\n
Thursday\n
Friday\n
Saturday\n
Sunday\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
9 AM - 9:30 PM\n
Fresh, delicious, yummy, refreshing and affordable shake and juices only from Chamba Juice and Shake!!!\n
Price Range · $\n
Smoothie & Juice Bar\n
Products\n
smoothies, milktea, juies\n
See more\n
See Less

Would a separate feature be needed for extracting the Reviews Tab?

suarezjessie avatar Jan 24 '22 07:01 suarezjessie

I see - looks like chambajuice doesn't have an about page. This commit (https://github.com/kevinzg/facebook-scraper/commit/a516dfabff4b5937ef99ea25c84e463473a29e3d) should make get_page_info extract the rating, under a new key called rating. No need to raise a separate issue for the feature of extracting reviews, we can re-use this one

neon-ninja avatar Jan 24 '22 20:01 neon-ninja

This commit (https://github.com/kevinzg/facebook-scraper/commit/e362c522dd500c3c91ffb858c6044fca3d4b4d9a) should make it possible to extract reviews. Sample usage:

for review in get_page_info("chambajuice")["reviews"]:
    pprint(review)

outputs:

{'post_url': 'https://facebook.com/story.php?story_fbid=844190382691206&id=100013007553035&locale2=en_US&__tn__=%2As%2As',
 'profile_picture': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t1.6435-1/cp0/e15/q65/p40x40/176057649_1213006852476222_7349829092521007297_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=mQwiEkVN55cAX-txl-s&_nc_ht=scontent.fakl8-1.fna&oh=00_AT_dqKC3Yhu2jYV9Pf4HJhJmn0yjOMoobEoajX5k4rpWfg&oe=6216E02B',
 'recommends': True,
 'text': 'good taste ang milktea. creamy',
 'time': datetime.datetime(2019, 12, 31, 7, 31, 42),
 'timestamp': 1577730702,
 'user_url': 'https://facebook.com/app.bennok?locale2=en_US',
 'username': 'Boy Montaos'}
{'post_url': 'https://facebook.com/story.php?story_fbid=4077043325658195&id=100000577028543&locale2=en_US&__tn__=%2As%2As',
 'profile_picture': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t39.30808-1/cp0/e15/q65/p40x40/252319237_5083768481652336_441345146184154296_n.jpg?_nc_cat=103&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=tr5R8QAt6-QAX-fEtyM&_nc_ht=scontent.fakl8-1.fna&oh=00_AT-P6GENcwW8sFQv1v1rnFmUbCeZYPwfUx0zXi9sK7bNiQ&oe=61F3BBC3',
 'recommends': True,
 'text': 'Super Affordable and yummy. napaka bilis pa nang service and '
         'delivery. 😊👍',
 'time': datetime.datetime(2020, 12, 9, 2, 27, 39),
 'timestamp': 1607434059,
 'user_url': 'https://facebook.com/hannahniah.lim?locale2=en_US',
 'username': 'Hananiah Fermin Lim'}
{'post_url': 'https://facebook.com/story.php?story_fbid=3178356272178102&id=100000112820909&locale2=en_US&__tn__=%2As%2As',
 'profile_picture': 'https://scontent.fakl8-1.fna.fbcdn.net/v/t39.30808-1/cp0/e15/q65/p40x40/218824795_6398471983499832_3335334123648518092_n.jpg?_nc_cat=104&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=sJ5901n2oWwAX8qVkQO&_nc_ht=scontent.fakl8-1.fna&oh=00_AT_VL2g9-9jqNkGWNfJOZJnxX9ejqgBsGMrFNdZ-wI_yCg&oe=61F38F86',
 'recommends': True,
 'text': 'ok naman. patamisin lang ng konti yung pearl 😊',
 'time': datetime.datetime(2019, 6, 4, 0, 53, 25),
 'timestamp': 1559566405,
 'user_url': 'https://facebook.com/yzhanyzhi?locale2=en_US',
 'username': 'Yazmine C J Bautista'}

neon-ninja avatar Jan 25 '22 01:01 neon-ninja

This page profile has review generator object but it throws out Content Not Found error.

Below is the code when scraping the page using get_page_info

from facebook_scraper import *
from pprint import pprint

set_cookies("fb_cookie.txt")
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
profile = get_page_info('atebeyandsell')
pprint(profile)

Below here is the output

{'about': 'About\n'
          '[http://instagram.com/atebeyofficial\n](http://instagram.com/atebeyofficial/n)'
          'Send message\n'
          'Entrepreneur · Gaming Video Creator\n'
          'See all',
 'address': None,
 'followers': 7964,
 'identifier': 107313684786461,
 'image': None,
 'likes': 6616,
 'name': 'Ate Bey and Sell',
 'profile_photo': 'https://scontent.fmnl4-3.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164655260_107316611452835_8528683889612977791_n.jpg?_nc_cat=110&ccb=1-5&_nc_sid=ed5ff1&efg=eyJpIjoidCJ9&_nc_eui2=AeH1BAfJPhyOlrPCVM-i5RSMckwgHk9sgqFyTCAeT2yCoRsyBrnuYXXkf8OdF8DXgEHEC2SHH_Dx7Ks7cSHtfxxq&_nc_ohc=WOM3rj6xiC0AX9Ct3Y1&_nc_ht=scontent.fmnl4-3.fna&oh=00_AT9WN2G_WwPbs8fQFtp1ho1EK9kQdfE9EK6q-WaB5ixntQ&oe=6232C265',
 'rating': 'Entrepreneur',
 'reviews': <generator object FacebookScraper.get_page_reviews at 0x7f9cb1346430>,
 'sameAs': 'instagram.com/atebeyofficial',
 'type': 'Person',
 'url': 'https://www.facebook.com/atebeyandsell/'}

From the output, it can be seen that there is a generator object for the reviews key. However, when trying to access it using the code below

for i in profile['reviews']:
    print(i)

It throws the following error

NotFound                                  Traceback (most recent call last)
/var/folders/25/k79djfcj737dwxvhtkr192zr8p8x5p/T/ipykernel_15908/4199099289.py in <module>
----> 1 for i in profile['reviews']:
      2     print(i)

~/opt/anaconda3/envs/sample/lib/python3.8/site-packages/facebook_scraper/facebook_scraper.py in get_page_reviews(self, page, **kwargs)
    521         while more_url:
    522             logger.debug(f"Fetching {more_url}")
--> 523             response = self.get(more_url)
    524             if response.text.startswith("for (;;);"):
    525                 prefix_length = len('for (;;);')

~/opt/anaconda3/envs/sample/lib/python3.8/site-packages/facebook_scraper/facebook_scraper.py in get(self, url, **kwargs)
    805             if title:
    806                 if title.text.lower() in not_found_titles:
--> 807                     raise exceptions.NotFound(title.text)
    808                 elif title.text.lower() == "error":
    809                     raise exceptions.UnexpectedResponse("Your request couldn't be processed")

NotFound: Content Not Found

suarezjessie avatar Feb 16 '22 05:02 suarezjessie

The reviews aren't accessible at https://m.facebook.com/pg/atebeyandsell/reviews/ either. This page must have reviews disabled.

neon-ninja avatar Feb 16 '22 06:02 neon-ninja

The problem might be with m.facebook.com The mobile version cannot open some urls

aminrabinia avatar Mar 02 '22 20:03 aminrabinia

The reviews aren't accessible at https://www.facebook.com/atebeyandsell/reviews either

neon-ninja avatar Mar 29 '22 23:03 neon-ninja