facebook_page_scraper
facebook_page_scraper copied to clipboard
CRITICAL - No posts were found!
I used the example you have given.
#import Facebook_scraper class from facebook_page_scraper from facebook_page_scraper import Facebook_scraper
#instantiate the Facebook_scraper class
page_name = "##MYNAME##" posts_count = 10 browser = "firefox" proxy = "" #if proxy requires authentication then user:password@IP:PORT timeout = 600 #600 seconds headless = True meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
json_data = meta_ai.scrap_to_json() print(json_data)
Following messages appear and I get no posts.
2024-01-04 09:53:29,565 - facebook_page_scraper.driver_initialization - INFO - Using: [WDM] - There is no [win64] geckodriver for browser in cache [WDM] - Getting latest mozilla release info for v0.34.0 [WDM] - Trying to download new driver from https://github.com/mozilla/geckodriver/releases/download/v0.34.0/geckodriver-v0.34.0-win64.zip [WDM] - Driver has been saved in cache [C:\Users\Talat Oncu.wdm\drivers\geckodriver\win64\v0.34.0] 2024-01-04 09:54:31,409 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found! Exit code: 1
Then I used for NintendoAmerica
#import Facebook_scraper class from facebook_page_scraper from facebook_page_scraper import Facebook_scraper
#instantiate the Facebook_scraper class
page_name = "NintendoAmerica" posts_count = 10 browser = "firefox" proxy = "" #if proxy requires authentication then user:password@IP:PORT timeout = 600 #600 seconds headless = True meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
json_data = meta_ai.scrap_to_json() print(json_data)
The program gives the message
2024-01-04 10:11:18,586 - facebook_page_scraper.driver_initialization - INFO - Using: [WDM] - Driver [C:\Users\Talat Oncu.wdm\drivers\geckodriver\win64\v0.34.0\geckodriver.exe] found in cache
and waits indefinitely.
Have the same issue, checked what's happening by having headless to false
I see that the browser doesn't login and the following result is seen on terminal
2024-01-04 16:02:11,918 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!
Can anyone help to figure out what can be done to figure this out. Thank you!
Hi all, i have the same issue when running on ubuntu but not on windows 11! Instead of the usual log in with the x in top right corner of widget we get a seperate page which requires a login before redirecting to the desired page.
If there is a way that we could login on then the webdriver would remember that and we would not get this issue, unfortunately everything i tried on this doesnt work. I have managed to solve the issue by coding my own facebook scraper using a chrome driver that can use a specific user data profile, but would prefer to use this if we can get a patch as less for me to maintain :D
Thanks
following on from this, i tried using a UK proxy which worked and produced the desired outcome
following on from this, i tried using a UK proxy which worked and produced the desired outcome
Could you tell the noob like me how to set proxy to UK?
following on from this, i tried using a UK proxy which worked and produced the desired outcome
Could you tell the noob like me how to set proxy to UK?
proxy='exampleproxy:exampleport' Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
@ExpiredMeteor6 Yes, using a Chrome profile that is already logged in will unblock you. Unfortunately, I cannot make that feature a part of this project as it claims that it can only scrape data available publicly
@shaikhsajid1111 is there any exception do you have which i can import in my code to handle the error [WDM] - Driver [C:\Users\manrkaur.wdm\drivers\geckodriver\win64\v0.34.0\geckodriver.exe] found in cache 2024-02-02 16:10:25,737 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!
def scrape_facebook_data(page_names, posts_count=10, browser="firefox", proxy=None, timeout=600, headless=True): """ Scrapes Facebook data for the given page names.
Parameters:
- page_names: List of Facebook page names
- posts_count: Number of posts to scrape per page
- browser: Browser to use (e.g., "firefox")
- proxy: Proxy information (e.g., "IP:PORT" or None)
- timeout: Timeout in seconds
- headless: Whether to run the browser in headless mode
Returns:
- A dictionary containing the scraped data for each page
"""
scraped_data = {}
for page_name in page_names:
# Instantiate the Facebook_scraper class
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
# Scraping data and converting it to JSON
json_data_str = meta_ai.scrap_to_json()
# Parse the JSON string into a dictionary
json_data = json.loads(json_data_str)
# Create an array to store post information
posts_array = []
# Iterate through each post and append to the array
for post_id, post_data in json_data.items():
time = post_data.get('posted_on', "")
content = post_data.get("content", "")
reaction_count = post_data.get('reaction_count',"")
comments = post_data.get('comments',"")
# Add a condition to check if content is not empty before appending
if content:
# Append post information to the array
posts_array.append({
# "Post ID": post_id,
"Content": content,
"Posted on": time,
"reaction_count":reaction_count,
"comments":comments
})
# Store the array for the current page in the result dictionary
scraped_data[page_name] = posts_array
return scraped_data
@testproto There isn't any custom Exception that it throws when no posts are found. You can write a wrapper function over this with try/except
?, If I'm understanding your requirement properly
@testproto There isn't any custom Exception that it throws when no posts are found. You can write a wrapper function over this with
try/except
?, If I'm understanding your requirement properly
It throws error when any page is private so how i can handle that scenario? Could you please help me with that @shaikhsajid1111 ?
@testproto There isn't any custom Exception that it throws when no posts are found. You can write a wrapper function over this with
try/except
?, If I'm understanding your requirement properly
from facebook_page_scraper import Facebook_scraper from facebook_page_scraper.driver_utilities import Utilities # Importing the Utilities class from your module from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.options import Options from bs4 import BeautifulSoup import json import re import requests
import logging
logging.basicConfig(level=logging.INFO) # Set the logging level to INFO or higher
def extract_facebook_page_name(url): """ Extracts Facebook page name from a given URL.
Parameters:
- url: URL of the website
Returns:
- Facebook page name if found, otherwise None
"""
try:
# Make a direct request and check the response status
response = requests.get(url)
if response.status_code == 200:
page_source = response.text
else:
# Use Selenium to get the page source if direct request fails
chrome_options = Options()
chrome_options.add_argument('--headless')
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
page_source = driver.page_source
driver.quit()
except Exception as e:
print(f"Error: {e}")
return None
# Use BeautifulSoup to parse HTML and find Facebook page link
soup = BeautifulSoup(page_source, 'html.parser')
facebook_link = soup.find('a', href=re.compile(r'facebook\.com', re.IGNORECASE))
if facebook_link:
# Extract page name from the Facebook link
match = re.search(r'facebook\.com/([^/?]+)', facebook_link['href'])
if match:
# Check if the page is private
if "page doesn't exist" in page_source or "The link you followed may be broken, or the page may have been removed" in page_source:
print(f"The Facebook page at {url} is either private or does not exist.")
return None
else:
return match.group(1)
return None
def scrape_facebook_data(page_names, posts_count=10, browser="firefox", proxy=None, timeout=600, headless=True): """ Scrapes Facebook data for the given page names.
Parameters:
- page_names: List of Facebook page names
- posts_count: Number of posts to scrape per page
- browser: Browser to use (e.g., "firefox")
- proxy: Proxy information (e.g., "IP:PORT" or None)
- timeout: Timeout in seconds
- headless: Whether to run the browser in headless mode
Returns:
- A dictionary containing the scraped data for each page, or None if no posts are found
"""
scraped_data = {}
for page_name in page_names:
# Instantiate the Facebook_scraper class
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
try:
# Scraping data and converting it to JSON
json_data_str = meta_ai.scrap_to_json()
# Parse the JSON string into a dictionary
json_data = json.loads(json_data_str)
# Create an array to store post information
posts_array = []
# Iterate through each post and append to the array
for post_id, post_data in json_data.items():
time = post_data.get('posted_on', "")
content = post_data.get("content", "")
reaction_count = post_data.get('reaction_count',"")
comments = post_data.get('comments',"")
# Add a condition to check if content is not empty before appending
if content:
# Append post information to the array
posts_array.append({
# "Post ID": post_id,
"Content": content,
"Posted on": time,
"reaction_count":reaction_count,
"comments":comments
})
# Store the array for the current page in the result dictionary
scraped_data[page_name] = posts_array
except Exception as e:
# Log the error as critical
print(f"Error scraping data for page '{page_name}': {e}")
continue # Continue to the next page if an error occurs
# Check if any data was scraped
if not scraped_data:
print("No posts were found for any of the provided pages.")
return None
return scraped_data
def getSocialMedia(urls, posts_count=10, browser="firefox", proxy=None, timeout=600, headless=True): """ Scrapes Facebook data for the given URLs.
Parameters:
- urls: List of website URLs
- posts_count: Number of posts to scrape per page
- browser: Browser to use (e.g., "firefox")
- proxy: Proxy information (e.g., "IP:PORT" or None)
- timeout: Timeout in seconds
- headless: Whether to run the browser in headless mode
Returns:
- A dictionary containing the scraped data for each page
"""
page_names = []
for url in urls:
# Extract Facebook page name from the URL
page_name = extract_facebook_page_name(url)
if page_name:
page_names.append(page_name)
# Call the function to scrape Facebook data using extracted page names
result = scrape_facebook_data(page_names, posts_count, browser, proxy, timeout, headless)
return result
Set up logging configuration
Example usage:
if name == "main": # List of website URLs urls = ['https://testmatick.com/', 'https://www.a1qa.com/']
# Common configuration for scraping
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" # if proxy requires authentication then user:password@IP:PORT
timeout = 600 # 600 seconds
headless = True
# Dictionary to store scraped data
result = {}
for url in urls:
# Extract Facebook page name from the URL
page_name = extract_facebook_page_name(url)
if page_name:
try:
# Call the function to scrape Facebook data for the current URL
page_data = scrape_facebook_data([page_name], posts_count, browser, proxy, timeout, headless)
if page_data:
# Add the scraped data to the result dictionary
result.update(page_data)
else:
print(f"No posts found for URL: {url}")
continue # Continue to the next URL if no posts are found
except Exception as e:
print(f"Error scraping data for URL '{url}': {e}")
continue # Continue to the next URL if an error occurs
else:
print(f"No Facebook page found for URL: {url}")
continue # Continue to the next URL if no Facebook page is found
# Check if result is empty and return None if it is
if not result:
print("No Facebook data found for the provided URLs.")
result = None
# Print the result
print(json.dumps(result, indent=2))
# List of website URLs
urls = ['https://testmatick.com/', 'https://www.a1qa.com/']
# Common configuration for scraping
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" # if proxy requires authentication then user:password@IP:PORT
timeout = 600 # 600 seconds
headless = True
# Dictionary to store scraped data
result = {}
for url in urls:
# Extract Facebook page name from the URL
page_name = extract_facebook_page_name(url)
if page_name:
try:
# Call the function to scrape Facebook data for the current URL
page_data = scrape_facebook_data([page_name], posts_count, browser, proxy, timeout, headless)
if page_data:
# Add the scraped data to the result dictionary
result.update(page_data)
else:
print(f"No posts found for URL: {url}")
continue # Continue to the next URL if no posts are found
except Exception as e:
print(f"Error scraping data for URL '{url}': {e}")
continue # Continue to the next URL if an error occurs
else:
print(f"No Facebook page found for URL: {url}")
continue # Continue to the next URL if no Facebook page is found
# Check if result is empty and return None if it is
if not result:
print("No Facebook data found for the provided URLs.")
result = None
# Print the result
print(json.dumps(result, indent=2))
# List of website URLs
urls = ['https://testmatick.com/', 'https://www.a1qa.com/']
# Common configuration for scraping
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" # if proxy requires authentication then user:password@IP:PORT
timeout = 600 # 600 seconds
headless = True
# Dictionary to store scraped data
result = {}
for url in urls:
# Extract Facebook page name from the URL
page_name = extract_facebook_page_name(url)
if page_name:
try:
# Call the function to scrape Facebook data for the current URL
page_data = scrape_facebook_data([page_name], posts_count, browser, proxy, timeout, headless)
if page_data:
# Add the scraped data to the result dictionary
result.update(page_data)
else:
print(f"No posts found for URL: {url}")
continue # Continue to the next URL if no posts are found
except Exception as e:
print(f"Error scraping data for URL '{url}': {e}")
continue # Continue to the next URL if an error occurs
else:
print(f"No Facebook page found for URL: {url}")
continue # Continue to the next URL if no Facebook page is found
# Check if result is empty and return None if it is
if not result:
print("No Facebook data found for the provided URLs.")
result = None
# Print the result
print(json.dumps(result, indent=2))
# List of website URLs
urls = ['https://testmatick.com/', 'https://www.a1qa.com/']
# Common configuration for scraping
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" # if proxy requires authentication then user:password@IP:PORT
timeout = 600 # 600 seconds
headless = True
# Dictionary to store scraped data
result = {}
for url in urls:
# Extract Facebook page name from the URL
page_name = extract_facebook_page_name(url)
if page_name:
try:
# Call the function to scrape Facebook data for the current URL
page_data = scrape_facebook_data([page_name], posts_count, browser, proxy, timeout, headless)
if page_data:
# Add the scraped data to the result dictionary
result.update(page_data)
else:
print(f"No posts found for URL: {url}")
continue # Continue to the next URL if no posts are found
except Exception as e:
print(f"Error scraping data for URL '{url}': {e}")
continue # Continue to the next URL if an error occurs
else:
print(f"No Facebook page found for URL: {url}")
continue # Continue to the next URL if no Facebook page is found
# Check if result is empty and return None if it is
if not result:
print("No Facebook data found for the provided URLs.")
result = None
# Print the result
print(json.dumps(result, indent=2))
**Seee i have using try except block but this code exists after checking for testmatick and not go to next url exists by saying critical no posts found**