espnscrapeR
espnscrapeR copied to clipboard
ESPN Data inconsistent
Note: This Issue isn't a code problem! It is just for information to the users and to make the developer aware of it.
ESPN is writing on it's Total QBR website
To qualify, a player must play a minimum of 20 action plays
which always was my explanation when a player was missing in the data. But it gets very confusing now. I am doing this example for the 2018 playoffs and didn't check it for other years.
2018 Wildcard weekend had the following games (winners bold):
- IND @ HOU
- SEA @ DAL
- LAC @ BAL
- PHI @ CHI
Running
qbr_week <- get_nfl_qbr("2018", season_type = "Playoffs", week = 1) %>%
select(short_name, team_short_name, qbr_total, qb_plays)
leads to 3 entries
But running
qbr_all <- get_nfl_qbr("2018", season_type = "Playoffs", week = NA)%>%
select(short_name, team_short_name, qbr_total, qb_plays)
leads to this
In the total data there are not only more qbs from the wildcard weekend (Watson, Wilson, Trubisky), there is also another total qbr given for Lamar Jackson... It is unclear which dataset to trust and the problem is that we can only combine qbs that lost because the overall dataset mixes the games of qbs who played more than one game.
I appreciate you finding some of these edge-cases!
I wonder if I should go back to just straight rvest
scraping the site - I'll dig into the API to see if there's a reason for duplicates.
AH I think I know why this is occuring. There is a "best" games option - where week is missing. This is weekly best games.
See: https://www.espn.com/nfl/qbr/_/view/weekly/season/2018/seasontype/3/week/
Yeah the thing is when you choose for example Wild Card instead of Best there are less entries for the Wild Card Weekend. That’s what makes no sense to me...