yahooquery icon indicating copy to clipboard operation
yahooquery copied to clipboard

Yahoo Finance Premium instituting recaptcha

Open me1029134 opened this issue 1 year ago • 16 comments
trafficstars

Describe the bug I believe there is some kind of recaptcha problem. It's not on all the request though maybe like half of them. Below is my error.

DevTools listening on ws://127.0.0.1:63373/devtools/browser/661f6e71-8cf3-4067-bbfe-3966923a90ab [1230/140638.380:ERROR:gl_utils.cc(412)] [.WebGL-00001D8400E82200]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels [1230/140640.413:ERROR:gl_utils.cc(412)] [.WebGL-00001D84002B3F00]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels Unable to login and/or retrieve the appropriate cookies. This is most likely due to Yahoo Finance instituting recaptcha, which this package does not support.

To Reproduce Steps to reproduce the behavior:

  1. When pulling: query = yq.Ticker('ASGTF', username= "UserEmail", password="PW")

I get: {'ASGTF': 'User is not logged in'}

Seems to do it about half the time and different tickers or pulling the same ticker multiple times.

Expected behavior I'm expecting to get p_all_financial_data. I see it when I'm login in. I verified I get it when I am logged in.

Screenshots

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Chrome
  • Version: 2.3.7

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context Comment from dpguthrie describing the problem and solution probably a little more. https://github.com/dpguthrie/yahooquery/issues/251#issuecomment-1869823881 I thought I had it fixed but it was not.

me1029134 avatar Dec 30 '23 22:12 me1029134

Is this what's happening in you code: When you are logging into yahoo account from selenium you get a recaptcha and you cannot continue?

Tharindu-Abay avatar Dec 31 '23 03:12 Tharindu-Abay

Attach a screenshot.

thelaycon avatar Dec 31 '23 07:12 thelaycon

can you add some visual files when you are getting error and when you are getting a normal expected result

samirgorai avatar Dec 31 '23 09:12 samirgorai

Sure, for example: If I run this example code:

import yahooquery as yq
password = 'PW'
userEmail = 'UserEmail'
symbol='AAPL'
while (True):
  query = yq.Ticker(symbol, username= userEmail, password=password)
  p_all_financial_data_quarter = query.p_all_financial_data(frequency='q')
  print(p_all_financial_data_quarter)

I get this the first run: image Working correctly,

and this the second run: image Not working. It seems like it will work about half the time, randomly (no sequence or anything)

Then of course on chrome when I'm logged in I see the correct data too: image

I'm tried it on Windows 10 and 11, and Python 3.9 and 3.12

Thanks in advance for your help!

me1029134 avatar Dec 31 '23 14:12 me1029134

some Questions: 1)Where there some recent changes because of this error is produced or does the previous versions of the library also shows this error 2)For your example import yahooquery as yq password = 'PW' userEmail = 'UserEmail' symbol='AAPL' while (True): query = yq.Ticker(symbol, username= userEmail, password=password) #trying to login with user credentials p_all_financial_data_quarter = query.p_all_financial_data(frequency='q') print(p_all_financial_data_quarter)

the login is done in base.py

Yfinnce Base

i think whenver a user logins it this part of the code must be executed can you confirm if i am correct/wrong

3)how can i debug at my local where can i get my Username password

samirgorai avatar Dec 31 '23 15:12 samirgorai

Is it possible to get your email id so that i can mesage you directly.

samirgorai avatar Dec 31 '23 16:12 samirgorai

Possible FIX can yo look at #255

samirgorai avatar Dec 31 '23 17:12 samirgorai

Hello @dpguthrie @me1029134 i TESTED THE CODE with my changes #255

import yahooquery as yq password = 'XXXXXX' userEmail = '[email protected]' symbol='AAPL' while (True): query = yq.Ticker(symbol, username= userEmail, password=password) p_all_financial_data_quarter = query.p_all_financial_data(frequency='q') print(p_all_financial_data_quarter)

AND THE RESULT WAS

DevTools listening on ws://127.0.0.1:64734/devtools/browser/41a2456b-89ec-4df2-b6a5-d65774e7c308 [0102/091740.817:ERROR:command_buffer_proxy_impl.cc(127)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer. [0102/091745.755:ERROR:gl_utils.cc(412)] [.WebGL-0000438400E7D400]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels [0102/091748.973:ERROR:gl_utils.cc(412)] [.WebGL-00004384002C0000]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels {'AAPL': 'User is not subscribed to Premium or has invalid cookies'}

CAN YOU CHECK ONCE AT YOUR SETUP WITH YOUR id

samirgorai avatar Jan 02 '24 03:01 samirgorai

Unfortunately after adding this line: self.driver.find_element(By.XPATH, "//input[@id='login-username']").send_keys(self.username) (and commenting out the other)

I'm still getting the same problem: image

me1029134 avatar Jan 02 '24 05:01 me1029134

@me1029134 how can i get the build please after my changes.

samirgorai avatar Jan 02 '24 07:01 samirgorai

I am able to login into login.yahoo.com

using the following script

""" file to test login """ from selenium.webdriver.support.ui import WebDriverWait from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import time from bs4 import BeautifulSoup from selenium.webdriver.support import expected_conditions as EC

while(1): username="[email protected]" pasword="XXXXXX" driver_path='C:\Users\samir\Web Scraping14-12-2023\geckodriver.exe' LOGIN_URL = "https://login.yahoo.com" browser = webdriver.Firefox() browser.get(LOGIN_URL) print(browser.title) browser.find_element(By.XPATH, "//input[@id='login-username']").send_keys(username) browser.find_element(By.XPATH, "//input[@id='login-signin']").click() password_element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "login-passwd"))) password_element.send_keys(pasword) browser.find_element(By.XPATH, "//button[@id='login-signin']").click()

time.sleep(5)

I think the problem is with image in base.py code if(instance.cookies:) the condition is resulting false

I can also see that it was modified in last commit.

samirgorai avatar Jan 02 '24 07:01 samirgorai

@me1029134 @dpguthrie can you PLease check once i have made some changes and commited

Thank you

samirgorai avatar Jan 02 '24 08:01 samirgorai

I believe the method we are trying to do is pull the cookies from a chrome log in session and load them into the Selenium session. Something along the lines of these articles: https://stackoverflow.com/questions/15058462/how-to-save-and-load-cookies-using-python-selenium-webdriver https://medium.com/@ghulammustafapy/efficient-login-session-management-in-selenium-python-save-and-reuse-credentials-for-browser-7aa21b32df63

me1029134 avatar Jan 03 '24 06:01 me1029134

I have a prototype fix that seems to work for me. I noticed if I put a 20 second wait after the login and before any of the pulls, it seems to not get hung up for some reason. I added that and I added just saving the entire session after a good login. It would be better if you could just pass in the cookies / session, that seems like the correct way to do it. Here is the fix that worked for me at least:

    def login(self) -> None:
        if _has_selenium:
            session_instance='session_save_location/session_instance.pkl'
            if os.path.exists(session_instance):
                with open(session_instance, 'rb') as file:
                    self.session.cookies = pickle.load(file)
            else:
                instance = YahooFinanceHeadless(self.username, self.password)
                instance.login()
                time.sleep(20)
                if instance.cookies:
                    self.session.cookies = instance.cookies
                    with open(session_instance, 'wb') as file:
                        pickle.dump(self.session.cookies, file)
                    return
                else:
                    logger.warning(
                        "Unable to login and/or retrieve the appropriate cookies.  This is "
                        "most likely due to Yahoo Finance instituting recaptcha, which "
                        "this package does not support."
                    )
        else:
            logger.warning(
                "You do not have the required libraries to use this feature.  Install "
                "with the following: `pip install yahooquery[premium]`"
            )
        self.session = setup_session(self.session, self._setup_url)

me1029134 avatar Jan 04 '24 05:01 me1029134

@dpguthrie Do you have a high level design any document/image to understand your library?

samirgorai avatar Jan 04 '24 12:01 samirgorai

@samirgorai Nope, sorry.

dpguthrie avatar Jan 04 '24 14:01 dpguthrie