pybaseball icon indicating copy to clipboard operation
pybaseball copied to clipboard

Created a new function to retrieve box scores from baseball reference…

Open demilio76 opened this issue 3 years ago • 7 comments

…. Quick example:

datetime_object = datetime.strptime('May 05 2021', '%b %d %Y') visitor_batting_df, home_batting_df, visitor_pitching_df, home_pitching_df
= box_score('OAK', datetime_object, 0) print(f"{visitor_pitching_df.loc[0, 'Pitching']} vs {home_pitching_df.loc[0, 'Pitching']}")

demilio76 avatar Oct 18 '21 14:10 demilio76

I had to parse through comments to get the data I wanted due to how bbref sets up their boxscore pages. Alternatively, I have a version that uses Selenium and a ChromeDriver which works a little cleaner (tables aren't in comments post-page load) but for now am submitting this version to avoid a new dependency

demilio76 avatar Oct 18 '21 14:10 demilio76

@TheCleric @bdilday if either of you can take a look?

schorrm avatar Oct 18 '21 22:10 schorrm

I had to parse through comments to get the data I wanted due to how bbref sets up their boxscore pages. Alternatively, I have a version that uses Selenium and a ChromeDriver which works a little cleaner (tables aren't in comments post-page load) but for now am submitting this version to avoid a new dependency

I think I'd rather have the cleaner, selenium, version. It doesn't seem like a crazy dependency for a library who's job is largely to scrape the web.

@schorrm @TheCleric any thoughts?

bdilday avatar Oct 19 '21 23:10 bdilday

I had to parse through comments to get the data I wanted due to how bbref sets up their boxscore pages. Alternatively, I have a version that uses Selenium and a ChromeDriver which works a little cleaner (tables aren't in comments post-page load) but for now am submitting this version to avoid a new dependency

I think I'd rather have the cleaner, selenium, version. It doesn't seem like a crazy dependency for a library who's job is largely to scrape the web.

@schorrm @TheCleric any thoughts?

@bdilday I'm not a fan of selenium for this since it doesn't need anything like JavaScript. As it is we've done similar things to this with just the xpath parser which can be used to parse into HTML comments.

EDIT: I found another PR where I provided some example code for something similar: https://github.com/jldbc/pybaseball/pull/137#discussion_r496769328

TheCleric avatar Oct 20 '21 00:10 TheCleric

I was playing around with the Selenium version and have changed my mind and now agree with not using that. Main reason for my change of heart was that I didnt fully realize how much slower Selenium was until I ran a batch of calls. E.g. to get all 162 box scores for the Dodgers games this past season, it took the non-Selenium version 62 seconds but the Selenium version took around 15 minutes.

demilio76 avatar Oct 23 '21 12:10 demilio76

For 15 min vs 62 seconds, that's a pretty clear winner here, even if Selenium would be cleaner.

schorrm avatar Oct 24 '21 12:10 schorrm

This has been opened through a year - are we merging this into the project or not? @schorrm @tjburch

BrayanMnz avatar Dec 05 '22 13:12 BrayanMnz