batting_stats() and pitching_stats() errors out for certain players/years
running pitching_stats with Kent Tekulve's playerid results in this error:
>>> pitching_stats(start_season=1979, end_season=1980, players=1012905)
Traceback (most recent call last):
File "/home/yzhang/.envs/deadball_env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 934, in _finalize_columns_and_data
columns alidate_or_indexify_columns(contents, columns)
File "/home/yzhang/.envs/deadball_env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 981, in _validate_or_indexify_columns
raise AssertionError(
AssertionError: 334 col= _validate_or_indexify_columns(contents, columns)
File "/home/yzhang/.envs/deadball_env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 981, in _validate_or_indexify_columns
raise AssertionError(
AssertionError: 334 columns passed, passed data had 1 columns
same type of error also seems to happen when calling batting_stats for 1970 Roberto Clemente as well. 1969 and 1971 both show up normally:
>>> batting_stats(1969, 1971, players=1002340)
IDfg Season Name Team Age G AB ... Events CStr% CSW% xBA xSLG xwOBA L-WAR
0 1002340 1969 Roberto Clemente PIT 34 138 507 ... 0 NaN NaN NaN NaN NaN 7.0
1 1002340 1971 Roberto Clemente PIT 36 132 522 ... 0 NaN NaN NaN NaN NaN 6.5
[2 rows x 320 columns]
whereas 1970 Clemente search returns a similar error (attaching full error trace)
>>> batting_stats(1970, players=1002340)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/yzhang/Downloads/pybaseball/pybaseball/cache/cache.py", line 58, in _cached
result = func(*args, **kwargs)
File "/home/yzhang/Downloads/pybaseball/pybaseball/datasources/fangraphs.py", line 176, in fetch
return super().fetch(*args, **kwargs)
File "/home/yzhang/Downloads/pybaseball/pybaseball/datasources/fangraphs.py", line 154, in fetch
self.html_accessor.get_tabular_data_from_options(
File "/home/yzhang/Downloads/pybaseball/pybaseball/datasources/html_table_processor.py", line 90, in get_tabular_data_from_options
return self.get_tabular_data_from_url(
File "/home/yzhang/Downloads/pybaseball/pybaseball/datasources/html_table_processor.py", line 78, in get_tabular_data_from_url
return self.get_tabular_data_from_html(
File "/home/yzhang/Downloads/pybaseball/pybaseball/datasources/html_table_processor.py", line 59, in get_tabular_data_from_html
return self.get_tabular_data_from_element(
File "/home/yzhang/Downloads/pybaseball/pybaseball/datasources/html_table_processor.py", line 50, in get_tabular_data_from_element
fg_data = pd.DataFrame(data_rows, columns=headings)
File "/home/yzhang/.envs/deadball_env/lib/python3.9/site-packages/pandas/core/frame.py", line 782, in __init__
arrays, columns, index = nested_data_to_arrays(
File "/home/yzhang/.envs/deadball_env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 498, in nested_data_to_arrays
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "/home/yzhang/.envs/deadball_env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 840, in to_arrays
content, columns = _finalize_columns_and_data(arr, columns, dtype)
File "/home/yzhang/.envs/deadball_env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 937, in _finalize_columns_and_data
raise ValueError(err) from err
ValueError: 320 columns passed, passed data had 1 columns
looks like an element might not be parsed correctly and so is just returning [[None]] when creating the DataFrame in get_tabular_data_from_element in HTMLTableProcessor. not sure if fangraphs has different DOM elements for different players/years that differs from the hardcoded rows/cells xpath?
Just add qual=0 argument to your code.
The default is qual=y and it means minimum PA should be 3.1 PA per team game which is 500 for one season.
For pitching qual=y means minimum IP should be 1 IP per team game which is 162 for one season.
Therefore, if you want to get one player's data no matter how much is his PA or IP just set qual=0
- pitching_stats(start_season=1979, end_season=1980, players=1012905)
- batting_stats(1969, 1971, players=1002340)
+ pitching_stats(start_season=1979, end_season=1980, players=1012905, qual=0)
+ batting_stats(1969, 1971, players=1002340, qual=0)
ref: https://github.com/jldbc/pybaseball/blob/master/docs/batting_stats.md