pybaseball icon indicating copy to clipboard operation
pybaseball copied to clipboard

'max_results' parameter for fangraphs query not respected when 'month' parameter given

Open abrobert opened this issue 2 years ago • 2 comments

data_sc = batting_stats(2023, 2023, month='JUNE', max_results=5)

The above returns 30 results while it should return 5

data_sc = batting_stats(2023, 2023, max_results=5)

The above returns 5 results which is correct.

abrobert avatar Jun 09 '23 19:06 abrobert

The Fangraphs URL when specifying month is not in the same format as when not specifying month.

  • When pulling 2023 non-pitcher stats for all batters, you would use: data_raw = batting_stats(start_season=2023,end_season=2023,position='NP',qual=0) which results in 655 records. The Fangraphs URL for this is: https://www.fangraphs.com/leaders-legacy.aspx?pos=np&stats=bat&lg=all&qual=0&type=8&season=2023&month=0&season1=2023&ind=0&team=0&rost=0&age=0&filter=&players=0&page=1_10000 which indeed results in 655 records
  • When trying to pull March/April 2023 non-pitcher stats for all batters, you would use: data_raw = batting_stats(start_season=2023,end_season=2023,position='NP',qual=0,month=4) which only yields 30 records but should result in 467 records. The Fangraphs URL for this is: https://www.fangraphs.com/leaders-legacy.aspx?pos=np&stats=bat&lg=all&qual=0&type=8&season=2023&month=4&season1=2023&ind=0&team=0&rost=0&age=0&filter=&players=0&startdate=&enddate=&page=1_10000

The inclusion of the "&startdate=&enddate=" in the latter URL is causing the paging the problem.

It's probably an easy fix, too. When you add the "&startdate=&enddate=" to the original URL, it still returns 655 records as expected. This URL is: https://www.fangraphs.com/leaders-legacy.aspx?pos=np&stats=bat&lg=all&qual=0&type=8&season=2023&month=0&season1=2023&ind=0&team=0&rost=0&age=0&filter=&players=0&startdate=&enddate=&page=1_10000

I'm a Python novice, otherwise I would just update the code to include "&startdate=&enddate=" before the max_results in the URL.

markspotsthex avatar Feb 25 '24 22:02 markspotsthex

I've been digging into this. When you supply the month, the following line in get_tabular_data_from_url function of HTMLTableProcessor doesn't process "query_params" as passed: response = requests.get(self.root_url + url, params=query_params) I checked the query_params up to this point, and it matches the "url_options" in the FangraphsDataTable code in fangraphs.py. So why does the requests.get go rogue here?

markspotsthex avatar Feb 28 '24 03:02 markspotsthex