nflfastR
nflfastR copied to clipboard
All `drive_*` variables broken for 2013 postseason (except `2013_19_NO_SEA`)
The drive_*
variables are joined to the plays in this line
https://github.com/nflverse/nflfastR/blob/92f4f92c64ebc3b80511b240077dfeedd091df06/R/helper_scrape_nfl.R#L182
It appears that the variable plays$driveSequenceNumber
is NA
for all 2013 postseason games (except 2013_19_NO_SEA
) which results in a broken join of the drive data.
library(dplyr, warn.conflicts = FALSE)
pbp_db <- nflreadr::load_pbp(TRUE)
bad_games <- pbp_db |>
filter(is.na(drive_play_count)) |>
count(game_id, drive_play_count) |>
filter(n>50)
This is the output of bad_games
game_id drive_play_count n
<chr> <dbl> <int>
1 2013_18_KC_IND NA 200
2 2013_18_NO_PHI NA 179
3 2013_18_SD_CIN NA 179
4 2013_18_SF_GB NA 173
5 2013_19_IND_NE NA 186
6 2013_19_SD_DEN NA 168
7 2013_19_SF_CAR NA 163
8 2013_20_NE_DEN NA 173
9 2013_20_SF_SEA NA 160
10 2013_21_SEA_DEN NA 165
pbp <- build_nflfastR_pbp(bad_games$game_id)
pbp |> select(starts_with("drive_")) |> glimpse()
Rows: 1,746
Columns: 17
$ drive_real_start_time <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_play_count <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_time_of_possession <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_first_downs <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_inside20 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_ended_with_score <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_quarter_start <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_quarter_end <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_yards_penalized <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_start_transition <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_end_transition <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_game_clock_start <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_game_clock_end <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_start_yard_line <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_end_yard_line <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_play_id_started <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_play_id_ended <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
These games are the complete list of problematic games so we should probably implement a manual fix at some point. One possibility for a fix is using fixed_drive
to create a list of play_ids and corresponding drive numbers