'max_results' parameter for fangraphs query not respected when 'month' parameter given
data_sc = batting_stats(2023, 2023, month='JUNE', max_results=5)
The above returns 30 results while it should return 5
data_sc = batting_stats(2023, 2023, max_results=5)
The above returns 5 results which is correct.
The Fangraphs URL when specifying month is not in the same format as when not specifying month.
- When pulling 2023 non-pitcher stats for all batters, you would use:
data_raw = batting_stats(start_season=2023,end_season=2023,position='NP',qual=0)which results in 655 records. The Fangraphs URL for this is: https://www.fangraphs.com/leaders-legacy.aspx?pos=np&stats=bat&lg=all&qual=0&type=8&season=2023&month=0&season1=2023&ind=0&team=0&rost=0&age=0&filter=&players=0&page=1_10000 which indeed results in 655 records - When trying to pull March/April 2023 non-pitcher stats for all batters, you would use:
data_raw = batting_stats(start_season=2023,end_season=2023,position='NP',qual=0,month=4)which only yields 30 records but should result in 467 records. The Fangraphs URL for this is: https://www.fangraphs.com/leaders-legacy.aspx?pos=np&stats=bat&lg=all&qual=0&type=8&season=2023&month=4&season1=2023&ind=0&team=0&rost=0&age=0&filter=&players=0&startdate=&enddate=&page=1_10000
The inclusion of the "&startdate=&enddate=" in the latter URL is causing the paging the problem.
It's probably an easy fix, too. When you add the "&startdate=&enddate=" to the original URL, it still returns 655 records as expected. This URL is: https://www.fangraphs.com/leaders-legacy.aspx?pos=np&stats=bat&lg=all&qual=0&type=8&season=2023&month=0&season1=2023&ind=0&team=0&rost=0&age=0&filter=&players=0&startdate=&enddate=&page=1_10000
I'm a Python novice, otherwise I would just update the code to include "&startdate=&enddate=" before the max_results in the URL.
I've been digging into this. When you supply the month, the following line in get_tabular_data_from_url function of HTMLTableProcessor doesn't process "query_params" as passed:
response = requests.get(self.root_url + url, params=query_params)
I checked the query_params up to this point, and it matches the "url_options" in the FangraphsDataTable code in fangraphs.py. So why does the requests.get go rogue here?