nflfastR icon indicating copy to clipboard operation
nflfastR copied to clipboard

All `drive_*` variables broken for 2013 postseason (except `2013_19_NO_SEA`)

Open mrcaseb opened this issue 3 years ago • 0 comments

The drive_* variables are joined to the plays in this line

https://github.com/nflverse/nflfastR/blob/92f4f92c64ebc3b80511b240077dfeedd091df06/R/helper_scrape_nfl.R#L182

It appears that the variable plays$driveSequenceNumber is NA for all 2013 postseason games (except 2013_19_NO_SEA) which results in a broken join of the drive data.

library(dplyr, warn.conflicts = FALSE)
pbp_db <- nflreadr::load_pbp(TRUE)
  
bad_games <- pbp_db |> 
  filter(is.na(drive_play_count)) |> 
  count(game_id, drive_play_count) |> 
  filter(n>50)

This is the output of bad_games

   game_id         drive_play_count     n
   <chr>                      <dbl> <int>
 1 2013_18_KC_IND                NA   200
 2 2013_18_NO_PHI                NA   179
 3 2013_18_SD_CIN                NA   179
 4 2013_18_SF_GB                 NA   173
 5 2013_19_IND_NE                NA   186
 6 2013_19_SD_DEN                NA   168
 7 2013_19_SF_CAR                NA   163
 8 2013_20_NE_DEN                NA   173
 9 2013_20_SF_SEA                NA   160
10 2013_21_SEA_DEN               NA   165
pbp <- build_nflfastR_pbp(bad_games$game_id)

pbp |> select(starts_with("drive_")) |> glimpse()
Rows: 1,746
Columns: 17
$ drive_real_start_time    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_play_count         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_time_of_possession <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_first_downs        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_inside20           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_ended_with_score   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_quarter_start      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_quarter_end        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_yards_penalized    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_start_transition   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_end_transition     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_game_clock_start   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_game_clock_end     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_start_yard_line    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_end_yard_line      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_play_id_started    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ drive_play_id_ended      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~

These games are the complete list of problematic games so we should probably implement a manual fix at some point. One possibility for a fix is using fixed_drive to create a list of play_ids and corresponding drive numbers

mrcaseb avatar Dec 17 '21 10:12 mrcaseb