pybaseball icon indicating copy to clipboard operation
pybaseball copied to clipboard

Inconsistent batting_stats_range() response.

Open nicholasg97 opened this issue 3 years ago • 3 comments

Around mid-day EST I ran batting_stats_range('2022-05-20'), and it only returned 8 rows. Going into debugger mode I was able to grab the raw URL pybaseball was sending to the requests module and it loaded fine multiple times in my browser, I saw 200+ rows of data.

I waited a few hours and it seems to work fine for me now, scraping all of the the rows correctly. Trying other dates now, I'm getting similar inconsistency.

I'm not an expert on the requests module but I believe its returning a response before the page is fully loaded. Has anybody experienced this before?

nicholasg97 avatar May 22 '22 21:05 nicholasg97

I'm getting this, too. I've been playing with different date ranges; some work but some don't. I ran:

from pybaseball import batting_stats_range

split = "2022-05-25"
data_before = batting_stats_range("2022-03-31", split)
data_after  = batting_stats_range(split, "2022-06-04")

Both "before" and "after" dataframes quit at Jose Altuve. Weird.

markspotsthex avatar Jun 05 '22 00:06 markspotsthex

@markspotsthex this sounds similar to the issue mentioned here https://github.com/jldbc/pybaseball/issues/218 https://github.com/jldbc/pybaseball/pull/223

do you have that update in your version?

bdilday avatar Jun 05 '22 18:06 bdilday

I was having similar issues, I would only get 20 rows from pybaseball.league_batting_stats.batting_stats_range() I altered the parser type in batting_stats_range.get_soup() to "html.parser" and I return 544 rows and accents are also presented better. def get_soup(start_dt: date, end_dt: date) -> BeautifulSoup: # get most recent standings if date not specified # if((start_dt is None) or (end_dt is None)): # print('Error: a date range needs to be specified') # return None url = "http://www.baseball-reference.com/leagues/daily.cgi?user_team=&bust_cache=&type=b&dates=fromandto&fromandto={}.{}&level=mlb&franch=&stat=&stat_value=0".format(start_dt, end_dt) s = requests.get(url).content return BeautifulSoup(s, "html.parser")

4G4M3MN0N avatar Jun 13 '22 21:06 4G4M3MN0N