soccerdata icon indicating copy to clipboard operation
soccerdata copied to clipboard

[FBref] 403 Forbidden error

Open MikeTrusky opened this issue 4 months ago • 6 comments

Describe the bug A call to sd.FBref results with "HTTPError: 403 Client Error: Forbidden".

Affected scrapers This affects the following scrapers:

  • [X] FBref

Code example

import soccerdata as sd

fbref = sd.FBref(leagues='ENG-Premier League', seasons='24/25', no_cache=True)

print(fbref.read_schedule())

Error message

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://fbref.com/en/comps/

MikeTrusky avatar Aug 23 '25 18:08 MikeTrusky

It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me:

import tls_requests

url = "https://fbref.com/en/comps/"

headers = {
    # "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    # "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6",
    # "cache-control": "max-age=0",
    # "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT",
    "sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"',
    # "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36",
}

res = tls_requests.get(url, headers=headers)
assert res.status_code == 200

(what you send as the header's value does not seem to matter)

probberechts avatar Aug 23 '25 20:08 probberechts

Great, I can confirm that without "sec-ch-ua" AssertionError is called, but with this header there is no assert call. So it seems to work. But, is it possible to add this header using soccerdata or there has to be change in fbref.py?

MikeTrusky avatar Aug 23 '25 22:08 MikeTrusky

Hello, I found a fix for the error:

I needed to create the header parameter, adding sec-ch-ua in the BaseRequestsReader Class.

I gonna let the code here:

------------------------------------------------------

class BaseRequestsReader(BaseReader): """Base class for readers that use the Python requests module."""

def __init__(
    self,
    leagues: Optional[Union[str, list[str]]] = None,
    proxy: Optional[
        Union[str, dict[str, str], list[dict[str, str]], Callable[[], dict[str, str]]]
    ] = None,
    no_cache: bool = False,
    no_store: bool = False,
    data_dir: Path = DATA_DIR,
):
    """Initialize the reader."""
    super().__init__(
        no_cache=no_cache,
        no_store=no_store,
        leagues=leagues,
        proxy=proxy,
        data_dir=data_dir,
    )

    self._session = self._init_session()

def _init_session(self) -> requests.Session:
    session = cloudscraper.create_scraper(
        browser={"browser": "chrome", "platform": "linux", "mobile": False}
    )
    session.proxies.update(self.proxy())
    return session

def _download_and_save(
    self,
    url: str,
    filepath: Optional[Path] = None,
    var: Optional[Union[str, Iterable[str]]] = None,
) -> IO[bytes]:
    """Download file at url to filepath. Overwrites if filepath exists."""
    headers = {
     "sec-ch-ua": '"Not A Brand";v="99", "Chromium";v="138", "Google Chrome";v="138"'
     }
    for i in range(5):
        try:
            response = self._session.get(url, stream=True, headers=headers)
            #response = self._session.get(url, header=headers)
            time.sleep(self.rate_limit + random.random() * self.max_delay)
            response.raise_for_status()
            if var is not None:
                if isinstance(var, str):
                    var = [var]
                var_names = "|".join(var)
                template_understat = rb"(%b)+[\s\t]*=[\s\t]*JSON\.parse\('(.*)'\)"
                pattern_understat = template_understat % bytes(var_names, encoding="utf-8")
                results = re.findall(pattern_understat, response.content)
                data = {
                    key.decode("unicode_escape"): json.loads(value.decode("unicode_escape"))
                    for key, value in results
                }
                payload = json.dumps(data).encode("utf-8")
            else:
                payload = response.content
            if not self.no_store and filepath is not None:
                with filepath.open(mode="wb") as fh:
                    fh.write(payload)
            return io.BytesIO(payload)
        except Exception:
            logger.exception(
                "Error while scraping %s. Retrying... (attempt %d of 5).",
                url,
                i + 1,
            )
            self._session = self._init_session()
            continue

    raise ConnectionError(f"Could not download {url}.")

Now, Fbref is working good!

gustavoalikan1910 avatar Aug 24 '25 21:08 gustavoalikan1910

It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me:

import tls_requests

url = "https://fbref.com/en/comps/"

headers = { # "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7", # "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6", # "cache-control": "max-age=0", # "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT", "sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"', # "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36", }

res = tls_requests.get(url, headers=headers) assert res.status_code == 200 (what you send as the header's value does not seem to matter)

I just tried this and got the below error message. Bit of a noob to not sure if I might be doing something wrong - any suggestions much appreciated!

AssertionError Traceback (most recent call last) /tmp/ipython-input-1561495402.py in <cell line: 0>() 13 14 res = tls_requests.get(url, headers=headers) ---> 15 assert res.status_code == 200

AssertionError:

ozzyman703 avatar Aug 25 '25 14:08 ozzyman703

I still get the 403 error even with the header...May I ask what's the time limit for accessing FBref now? Also any other possible solutions?

mhd0528 avatar Oct 04 '25 03:10 mhd0528

i still face the same error for now does any one find a solution

keroloshany47 avatar Dec 08 '25 05:12 keroloshany47