basketball_reference_web_scraper icon indicating copy to clipboard operation
basketball_reference_web_scraper copied to clipboard

Game Context (Playoffs, Regular Season) and Advanced Player Stats From Individual Games

Open benjaminmesser opened this issue 11 months ago • 3 comments

My ultimate goal is to build a database of all significant game data from all previous seasons using this tool. To start, I was planning on scraping from all games using the team_box_scores and player_box_scores functions to obtain team and player data for every game.

Now one issue is that advanced stats are missing for both of these. Adding them to team_box_scores doesn't seem like it would be too challenging, but for player_box_scores that data is missing from the "Daily Stats Leaders" pages that it uses, so it would probably have to be reworked to also look at the game box scores like team_box_scores does to get advanced player data, which would be a little annoying to implement.

But another issue is getting game context. Seemingly, there is no easy way to find if a game is a regular season or playoff game currently, and ideally, I would want even more context than this, such as which # game it is in the regular season or in a playoff series, and in which playoff series it is. At the very least, it seems like a necessity to know at least if a game is in the regular season game or the playoffs.

Playoff games do have headers like this on them:

image

It seems like a pretty janky solution, but is the only way to figure out if a game is a playoff game to scrape these titles?

Or is a better solution to just use some master list of games and dates and contexts from somewhere and then compare all games I have in my database with this list to figure out which games are playoff and which are regular season? This would probably necessitate that the deprecated/defunct teams are figured out/fixed as well because a lot of older games simply have blank team names still. Another issue with this I could see is with teams that played multiple times on the same day, but as far as I can tell, this has only happened once: https://www.statmuse.com/nba/ask/has-a-nba-team-ever-played-2-games-in-a-day so that edge case should be doable to manage. The stats leaders page for that day also seems to work just fine, with players showing up twice if they played in two games that day: https://www.basketball-reference.com/friv/dailyleaders.cgi?month=3&day=8&year=1954

There are also other quirks like all-star games, which are currently missing entirely. Although there is this page so it would probably be fairly simple to add a function to look them up: https://www.basketball-reference.com/allstar/

Either way, do you have any suggestions for any of this? I originally thought I would just build my own web scraper, but after seeing that this project exists and is actively maintained, I'd rather just use this if possible.

benjaminmesser avatar Nov 16 '24 23:11 benjaminmesser