facebook-post-scraper
facebook-post-scraper copied to clipboard
Can't log in because of cookies
When running the script, I get:
Traceback (most recent call last):
File "scraper.py", line 357, in <module>
postBigDict = extract(page=args.page, numOfPost=args.len, infinite_scroll=infinite, scrape_comment=scrape_comment)
File "scraper.py", line 258, in extract
_login(browser, EMAIL, PASSWORD)
File "scraper.py", line 201, in _login
browser.find_element_by_id('loginbutton').click()
File "~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 360, in find_element_by_id
return self.find_element(by=By.ID, value=id_)
File "~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 978, in find_element
'value': value})['value']
File "~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="loginbutton"]"}
(Session info: chrome=87.0.4280.88)
The browser shows the allow cookie window. Is there any solution?
After changing the log in logic with the following code:
def _login(browser, email, password):
browser.get("http://facebook.com")
browser.maximize_window()
browser.find_element_by_name("email").send_keys(email)
browser.find_element_by_name("pass").send_keys(password)
browser.find_element_by_id("u_0_h").click()
browser.find_element_by_name("login").click()
I get this new error:
Traceback (most recent call last):
File "scraper.py", line 405, in <module>
scrape_comment=scrape_comment,
File "scraper.py", line 279, in extract
browser.get(page)
File "/tmp/tmp.W3CmledTvJ/env/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/tmp/tmp.W3CmledTvJ/env/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/tmp/tmp.W3CmledTvJ/env/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
(Session info: chrome=87.0.4280.66)
Managed to make it login like this
def _login(browser, email, password):
browser.get("http://facebook.com")
browser.maximize_window()
browser.find_element_by_id("u_0_h").click()
time.sleep(3)
browser.find_element_by_name("email").send_keys(email)
browser.find_element_by_name("pass").send_keys(password)
browser.find_element_by_name("login").click()
time.sleep(5)
The cookie-issue was solved for me by using a vpn with the US as location, since they dont have this request. Not the most beautiful solution but it worked.
Here is what I did
Change x_path_text_cookies and x_path_text_login data to match your language (mine is for polish).
def _login(browser, email, password):
browser.get("http://facebook.com")
browser.maximize_window()
browser.find_element_by_name("email").send_keys(email)
browser.find_element_by_name("pass").send_keys(password)
x_path_text_cookies = '//*[@title="Akceptuj wszystkie"]'
x_path_text_login = '//*[@name="login"]'
browser.find_element_by_xpath(x_path_text_cookies).click()
browser.find_element_by_xpath(x_path_text_login).click()
time.sleep(5)
It should work now
for me it worked substituting the _login with the following:
note that "consenti solo coockie essenziali" should be changed with "allow only essential cookies" for english versions.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def _login(browser, email, password):
browser.get("http://facebook.com")
browser.maximize_window()
browser.implicitly_wait(5)
WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[contains(string(), 'Consenti solo i cookie essenziali')]"))).click()
time.sleep(5)
browser.find_element(By.NAME, "email").send_keys(email)
browser.find_element(By.NAME, "pass").send_keys(password)
browser.find_element(By.NAME, "login").click()
time.sleep(5)
Sadly the elegant solution by @ferrazzipietro seems not to work.
DevTools listening on ws://127.0.0.1:50144/devtools/browser/248f4965-473a-42ee-a5e6-51dddec9dd2c
[24904:25920:1005/002847.731:ERROR:device_event_log_impl.cc(214)] [00:28:47.731] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[24904:25920:1005/002847.733:ERROR:device_event_log_impl.cc(214)] [00:28:47.733] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\mikha\Downloads\chromedriver_win32\scraper.py", line 258, in extract
option.add_experimental_option("prefs", {
File "C:\Users\mikha\Downloads\chromedriver_win32\scraper.py", line 200, in _login
def _login(browser, email, password):
AttributeError: 'WebDriver' object has no attribute 'find_element_by_name'
>>>
@mikhail-poda seems like you are still using find_element_by_name(), that is no longer the choice for webdriver. As far as I know, you should use find_element() and then specify by what, as I did in the snippet I posted.
Thank you @ferrazzipietro, it was my mistake - I had to close the py file in Notepad++ (saving the py file was not enough) so that the python runtime had the new py file version. After successful login and opening the group the chrome window disappears with the message
DevTools listening on ws://127.0.0.1:51236/devtools/browser/2f7e82af-6abc-4f01-8882-112db12f7ecc
[29572:8024:1005/205129.621:ERROR:device_event_log_impl.cc(214)] [20:51:29.621] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[29572:8024:1005/205129.622:ERROR:device_event_log_impl.cc(214)] [20:51:29.623] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[29572:28644:1005/205137.891:ERROR:registration_request.cc(266)] Registration response error message: PHONE_REGISTRATION_ERROR
[29572:28644:1005/205137.985:ERROR:mcs_client.cc(707)] Error code: 500 Error message: Authentication Failed.
[29572:28644:1005/205137.985:ERROR:mcs_client.cc(709)] Failed to log in to GCM, resetting connection.
Number Of Scrolls Needed 2603