[FBref] 403 Forbidden error
Describe the bug A call to sd.FBref results with "HTTPError: 403 Client Error: Forbidden".
Affected scrapers This affects the following scrapers:
- [X] FBref
Code example
import soccerdata as sd
fbref = sd.FBref(leagues='ENG-Premier League', seasons='24/25', no_cache=True)
print(fbref.read_schedule())
Error message
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://fbref.com/en/comps/
It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me:
import tls_requests
url = "https://fbref.com/en/comps/"
headers = {
# "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
# "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6",
# "cache-control": "max-age=0",
# "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT",
"sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"',
# "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36",
}
res = tls_requests.get(url, headers=headers)
assert res.status_code == 200
(what you send as the header's value does not seem to matter)
Great, I can confirm that without "sec-ch-ua" AssertionError is called, but with this header there is no assert call. So it seems to work. But, is it possible to add this header using soccerdata or there has to be change in fbref.py?
Hello, I found a fix for the error:
I needed to create the header parameter, adding sec-ch-ua in the BaseRequestsReader Class.
I gonna let the code here:
------------------------------------------------------
class BaseRequestsReader(BaseReader): """Base class for readers that use the Python requests module."""
def __init__(
self,
leagues: Optional[Union[str, list[str]]] = None,
proxy: Optional[
Union[str, dict[str, str], list[dict[str, str]], Callable[[], dict[str, str]]]
] = None,
no_cache: bool = False,
no_store: bool = False,
data_dir: Path = DATA_DIR,
):
"""Initialize the reader."""
super().__init__(
no_cache=no_cache,
no_store=no_store,
leagues=leagues,
proxy=proxy,
data_dir=data_dir,
)
self._session = self._init_session()
def _init_session(self) -> requests.Session:
session = cloudscraper.create_scraper(
browser={"browser": "chrome", "platform": "linux", "mobile": False}
)
session.proxies.update(self.proxy())
return session
def _download_and_save(
self,
url: str,
filepath: Optional[Path] = None,
var: Optional[Union[str, Iterable[str]]] = None,
) -> IO[bytes]:
"""Download file at url to filepath. Overwrites if filepath exists."""
headers = {
"sec-ch-ua": '"Not A Brand";v="99", "Chromium";v="138", "Google Chrome";v="138"'
}
for i in range(5):
try:
response = self._session.get(url, stream=True, headers=headers)
#response = self._session.get(url, header=headers)
time.sleep(self.rate_limit + random.random() * self.max_delay)
response.raise_for_status()
if var is not None:
if isinstance(var, str):
var = [var]
var_names = "|".join(var)
template_understat = rb"(%b)+[\s\t]*=[\s\t]*JSON\.parse\('(.*)'\)"
pattern_understat = template_understat % bytes(var_names, encoding="utf-8")
results = re.findall(pattern_understat, response.content)
data = {
key.decode("unicode_escape"): json.loads(value.decode("unicode_escape"))
for key, value in results
}
payload = json.dumps(data).encode("utf-8")
else:
payload = response.content
if not self.no_store and filepath is not None:
with filepath.open(mode="wb") as fh:
fh.write(payload)
return io.BytesIO(payload)
except Exception:
logger.exception(
"Error while scraping %s. Retrying... (attempt %d of 5).",
url,
i + 1,
)
self._session = self._init_session()
continue
raise ConnectionError(f"Could not download {url}.")
Now, Fbref is working good!
It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me:
import tls_requests
url = "https://fbref.com/en/comps/"
headers = { # "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7", # "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6", # "cache-control": "max-age=0", # "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT", "sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"', # "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36", }
res = tls_requests.get(url, headers=headers) assert res.status_code == 200 (what you send as the header's value does not seem to matter)
I just tried this and got the below error message. Bit of a noob to not sure if I might be doing something wrong - any suggestions much appreciated!
AssertionError Traceback (most recent call last) /tmp/ipython-input-1561495402.py in <cell line: 0>() 13 14 res = tls_requests.get(url, headers=headers) ---> 15 assert res.status_code == 200
AssertionError:
I still get the 403 error even with the header...May I ask what's the time limit for accessing FBref now? Also any other possible solutions?
i still face the same error for now does any one find a solution