pybaseball Created a new function to retrieve box scores from baseball reference…

…. Quick example:

datetime_object = datetime.strptime('May 05 2021', '%b %d %Y') visitor_batting_df, home_batting_df, visitor_pitching_df, home_pitching_df
= box_score('OAK', datetime_object, 0) print(f"{visitor_pitching_df.loc[0, 'Pitching']} vs {home_pitching_df.loc[0, 'Pitching']}")

Oct 18 '21 14:10 demilio76

I had to parse through comments to get the data I wanted due to how bbref sets up their boxscore pages. Alternatively, I have a version that uses Selenium and a ChromeDriver which works a little cleaner (tables aren't in comments post-page load) but for now am submitting this version to avoid a new dependency

Oct 18 '21 14:10 demilio76

@TheCleric @bdilday if either of you can take a look?

Oct 18 '21 22:10 schorrm

I had to parse through comments to get the data I wanted due to how bbref sets up their boxscore pages. Alternatively, I have a version that uses Selenium and a ChromeDriver which works a little cleaner (tables aren't in comments post-page load) but for now am submitting this version to avoid a new dependency

I think I'd rather have the cleaner, selenium, version. It doesn't seem like a crazy dependency for a library who's job is largely to scrape the web.

@schorrm @TheCleric any thoughts?

Oct 19 '21 23:10 bdilday

I had to parse through comments to get the data I wanted due to how bbref sets up their boxscore pages. Alternatively, I have a version that uses Selenium and a ChromeDriver which works a little cleaner (tables aren't in comments post-page load) but for now am submitting this version to avoid a new dependency

I think I'd rather have the cleaner, selenium, version. It doesn't seem like a crazy dependency for a library who's job is largely to scrape the web.

@schorrm @TheCleric any thoughts?

@bdilday I'm not a fan of selenium for this since it doesn't need anything like JavaScript. As it is we've done similar things to this with just the xpath parser which can be used to parse into HTML comments.

EDIT: I found another PR where I provided some example code for something similar: https://github.com/jldbc/pybaseball/pull/137#discussion_r496769328

Oct 20 '21 00:10 TheCleric

I was playing around with the Selenium version and have changed my mind and now agree with not using that. Main reason for my change of heart was that I didnt fully realize how much slower Selenium was until I ran a batch of calls. E.g. to get all 162 box scores for the Dodgers games this past season, it took the non-Selenium version 62 seconds but the Selenium version took around 15 minutes.

Oct 23 '21 12:10 demilio76

For 15 min vs 62 seconds, that's a pretty clear winner here, even if Selenium would be cleaner.

Oct 24 '21 12:10 schorrm

This has been opened through a year - are we merging this into the project or not? @schorrm @tjburch

Dec 05 '22 13:12 BrayanMnz

pybaseball pybaseball copied to clipboard

Created a new function to retrieve box scores from baseball reference…

pybaseball
pybaseball copied to clipboard