beanprice icon indicating copy to clipboard operation
beanprice copied to clipboard

Yahoo source results in HTTP 429 error code

Open phantom-voltage opened this issue 9 months ago • 3 comments

As a first time user of beanprice, I tested out the functionality, and could not get it to work.

bean-price --no-cache -e 'USD:yahoo/AAPL'

This results in the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/requests/models.py", line 974, in json
    return complexjson.loads(self.text, **kwargs)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/simplejson/__init__.py", line 533, in loads
    return cls(encoding=encoding, **kw).decode(s)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/usr/lib/python3.13/site-packages/simplejson/decoder.py", line 386, in decode
    obj, end = self.raw_decode(s)
               ~~~~~~~~~~~~~~~^^^
  File "/usr/lib/python3.13/site-packages/simplejson/decoder.py", line 416, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/bean-price", line 33, in <module>
    sys.exit(load_entry_point('beanprice==2.0.0', 'console_scripts', 'bean-price')())
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/site-packages/beanprice/price.py", line 967, in main
    price_entries = sorted(price_entries, key=lambda e: e.currency)
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ~~~~~~~~~~^^^^^^^^^
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.13/site-packages/beanprice/price.py", line 596, in fetch_price
    srcprice = fetch_cached_price(source, psource.symbol, dprice.date)
  File "/usr/lib/python3.13/site-packages/beanprice/price.py", line 497, in fetch_cached_price
    source.get_latest_price(symbol)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/usr/lib/python3.13/site-packages/beanprice/sources/yahoo.py", line 148, in get_latest_price
    result = parse_response(response)
  File "/usr/lib/python3.13/site-packages/beanprice/sources/yahoo.py", line 40, in parse_response
    json = response.json(parse_float=Decimal)
  File "/usr/lib/python3.13/site-packages/requests/models.py", line 978, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Through some print debugging, I noticed the response code received was 429 or Too many requests.

Response: <Response [429]>
Reason: Too Many Requests
URL: https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL&fields=symbol%2CregularMarketPrice%2CregularMarketTime&exchange=NYSE&crumb=Too+Many+Requests%0D%0A&lang=en-US&corsDomain=finance.yahoo.com&.tsrc=finance
Headers: {'Date': 'Sun, 08 Jun 2025 20:46:37 GMT', 'Strict-Transport-Security': 'max-age=31536000', 'Server': 'ATS', 'Cache-Control': 'no-store', 'Content-Type': 'text/html', 'Content-Language': 'en', 'Referrer-Policy': 'no-referrer-when-downgrade', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Length': '19', 'Age': '0', 'Connection': 'keep-alive'}

Note, that I did try the changes suggested in this pull-request: https://github.com/beancount/beanprice/pull/86 It did not change the behavior displayed above. Another note, in my testing, I likely sent many requests to yahoo trying to get this to work, however the python trace was the same.

First Edit: It looks like they may have identified many users were using an undocumented API or both fc.yahoo.com and the other proposed url guce.yahoo.com/consent are out of date for establishing a session. Testing the URL provided in the print debugging showed this response body:

{"finance":{"result":null,"error":{"code":"Unauthorized","description":"User is unable to access this feature - https://bit.ly/yahoo-finance-api-feedback"}}}

Second Edit: yfinance, which uses the same API and session establishment, is also facing similar errors and likely due to a change in Yahoo APIs. See https://github.com/ranaroussi/yfinance/issues/2520

phantom-voltage avatar Jun 08 '25 21:06 phantom-voltage

I don't think retry can solve the problem. Check yfinance and found it bypass the 429 in latest 0.2.62 version. It seem to by pass TLS figureprint check using curl-cffi. For my own use case I write a new source of yahoo directly use yfinance.

import datetime
import sys
from typing import Dict, Optional, List, NamedTuple
from decimal import Decimal
import pandas as pd

import yfinance as yf
from dateutil.parser import parse as parse_datetime
from beancount.core.number import D
from beancount.core.amount import Amount
from beancount.core import data
from beancount.prices import source


class SourceError(Exception):
    """Exception raised for errors in the source."""
    pass


def get_latest_price(ticker: str) -> Optional[pd.DataFrame]:
    """Get the latest price data for a ticker from yfinance.
    
    Args:
        ticker: The stock symbol to fetch data for
        
    Returns:
        DataFrame with latest price data including timezone-aware datetime index and Close price, 
        or None if not found
    """
    try:
        stock = yf.Ticker(ticker)
        hist = stock.history(period="1d")
        if hist.empty:
            return None
        return hist
    except Exception as e:
        print(f"Error fetching price for {ticker}: {e}", file=sys.stderr)
        return None


class Source(source.Source):
    """beanprice source for Yahoo Finance using yfinance library."""
    
    def get_latest_price(self, ticker: str) -> Optional[source.SourcePrice]:
        """Get the latest price for a ticker.
        
        Args:
            ticker: The stock symbol to fetch data for
            
        Returns:
            A SourcePrice instance or None if not found
        """
        hist_df = get_latest_price(ticker)
        if hist_df is None or hist_df.empty:
            return None
            
        # Get the latest price and its corresponding timestamp
        latest_price = float(hist_df['Close'].iloc[-1])
        latest_timestamp = hist_df.index[-1]
        
        # Convert pandas timestamp to datetime with timezone
        if hasattr(latest_timestamp, 'to_pydatetime'):
            price_time = latest_timestamp.to_pydatetime()
        else:
            price_time = latest_timestamp
            
        # Ensure timezone info is present
        if price_time.tzinfo is None:
            price_time = price_time.replace(tzinfo=datetime.timezone.utc)
            
        return source.SourcePrice(
            price=D(str(latest_price)),
            time=price_time,
            quote_currency='USD'
        )

    def get_historical_price(self, ticker: str, time: datetime.date) -> Optional[source.SourcePrice]:
        """Get historical price for a ticker at a specific date.
        
        Args:
            ticker: The stock symbol to fetch data for
            time: The date to get the price for
            
        Returns:
            A SourcePrice instance or None if not found
        """
        try:
            stock = yf.Ticker(ticker)
            # Get data around the requested date
            # Ensure we have a date object for calculations
            if isinstance(time, datetime.datetime):
                target_date = time.date()
            else:
                target_date = time
            
            start_date = target_date - datetime.timedelta(days=7)
            end_date = target_date + datetime.timedelta(days=1)
            
            hist = stock.history(start=start_date, end=end_date)
            if hist.empty:
                return None
                
            # Find the closest date - convert hist.index to dates for comparison
            hist_dates = [d.date() for d in hist.index]
            if target_date in hist_dates:
                # Find the datetime index that matches our target date
                target_datetime = None
                for dt in hist.index:
                    if dt.date() == target_date:
                        target_datetime = dt
                        break
                price = float(hist.loc[target_datetime]['Close'])
                actual_time = target_datetime
            else:
                # Find closest available date
                closest_date = min(hist_dates, key=lambda x: abs((x - target_date).days))
                # Find the datetime index that matches the closest date
                target_datetime = None
                for dt in hist.index:
                    if dt.date() == closest_date:
                        target_datetime = dt
                        break
                price = float(hist.loc[target_datetime]['Close'])
                actual_time = target_datetime
                
            # Convert pandas timestamp to datetime with timezone
            if hasattr(actual_time, 'to_pydatetime'):
                price_time = actual_time.to_pydatetime()
            else:
                price_time = actual_time
                
            # Ensure timezone info is present
            if price_time.tzinfo is None:
                price_time = price_time.replace(tzinfo=datetime.timezone.utc)
                
            return source.SourcePrice(
                price=D(str(price)),
                time=price_time,
                quote_currency='USD'
            )
        except Exception as e:
            print(f"Error fetching historical price for {ticker} on {time}: {e}", file=sys.stderr)
            return None

For usage:

PYTHONPATH=./scripts/sources bean-price --no-cache -d 2025-05-06 -e HKD:yahoo2/0700.HK

Named the script as yahoo2.py it affect beanprice source name. The PYTHONPATH should be the directory of this python file.

fengkx avatar Jun 11 '25 06:06 fengkx

I don't understand how TLS would be bypassed with curl-cffi or how that would relate to a 429 error.

Also the current master verison of yahoo.py uses curl-cffi

from curl_cffi import requests

https://github.com/beancount/beanprice/blob/master/beanprice/sources/yahoo.py#L25

phantom-voltage avatar Jun 12 '25 06:06 phantom-voltage

Checked my beanprice version, It is not updated to latest master branch because my pyproject.toml lock to beancount 2.3.6.

After updated to latest bean-price, The 429 error is gone.

I think yahoo is using TLS fingerprint to prevent robots from fetching their data. 429 to many requests is just an excuse

fengkx avatar Jun 13 '25 07:06 fengkx