Issue with batting_stats_range() function - IndexError: list index out of range
Hello,
I'm new to Python and pybaseball, and I've encountered an issue when trying to use the batting_stats_range() function to retrieve batting stats.
Here is the code I'm using:
from pybaseball import batting_stats_range import pandas as pd
#Define the date range for the batting stats batting_data = batting_stats_range('2018-01-01', '2023-12-31')
#Save the data to a CSV file batting_data.to_csv('batting_data_2018_2023.csv', index=False)
When I run this code, I receive the following error:
Traceback (most recent call last):
File "[directory]\batting_stats_pull.py", line 5, in
I've attempted to update pybaseball using pip install --upgrade pybaseball, but this didn't resolve the issue.
Upon looking into the source code, I found that the get_soup() function in league_batting_stats.py is using web scraping to retrieve data from a specific URL on the Baseball-Reference website. I'm wondering if the structure of the website has changed since the pybaseball library was last updated, or if there are measures in place on the website that prevent or limit web scraping?
Apologies in advance if this is a known or simple issue - again, I'm new to this. Any help or guidance you can provide would be greatly appreciated.
Thank you, MWT
So, I have some slightly good news, followed by some not so good news.
This is a known issue (the other issue is here: https://github.com/jldbc/pybaseball/issues/332), unfortunately the "fix" isn't all that great. For starters, I'm not sure how to pull that many dates at once. I wanted each individual game day stat line, so I went date by date. When I did that, I had to call a sleep() function between calls, which drastically increased the time it took to complete. It's not a great solution, but I eventually got it to work.
As far as calling all the stats over that long of a time, I'm not sure of a potential solution along the sleep() call route, sorry.