nflfastR
nflfastR copied to clipboard
[BUG] nflfastr::calculate_player_stats returns duplicate rows for defense and kicker
Is there an existing issue for this?
- [X] I have searched the existing issues
If this is a data issue, have you tried clearing your nflverse cache?
I have cleared my nflverse cache and the issue persists.
What version of the package do you have?
nflreadr
1.4.1
Describe the bug
There are duplicated combinations of player_id
-season
-week
combinations in the player stats database (from the load_player_stats()
function). I cannot think of a reason why the same player would have multiple rows for a given season and week combination. If (as I suspect), this is not possible, then this would be a data issue to fix. If I'm incorrect and it is plausible that the same player could have multiple rows for a given season and week combination, then it would be helpful to know the circumstances when this could arise. This is important for merging with other datasets to ensure I am merging the information to the correct player_id
-season
-week
combination.
Reprex
library("nflreadr")
library("dplyr")
#> Warning: package 'dplyr' was built under R version 4.3.2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Load Data
offenseStats_weekly <- load_player_stats(
seasons = TRUE,
stat_type = "offense")
defenseStats_weekly <- load_player_stats(
seasons = TRUE,
stat_type = "defense")
kickingStats_weekly <- load_player_stats(
seasons = TRUE,
stat_type = "kicking")
# Rearrange variables
offenseStats_weekly <- offenseStats_weekly %>%
select(player_id, season, week, everything())
defenseStats_weekly <- defenseStats_weekly %>%
select(player_id, season, week, everything())
kickingStats_weekly <- kickingStats_weekly %>%
select(player_id, season, week, everything())
# Offense: No duplicate id-season-week combinations
offenseStats_weekly %>%
group_by(player_id, season, week) %>%
filter(n() > 1)
#> # A tibble: 0 × 53
#> # Groups: player_id, season, week [0]
#> # ℹ 53 variables: player_id <chr>, season <int>, week <int>, player_name <chr>,
#> # player_display_name <chr>, position <chr>, position_group <chr>,
#> # headshot_url <chr>, recent_team <chr>, season_type <chr>,
#> # opponent_team <chr>, completions <int>, attempts <int>,
#> # passing_yards <dbl>, passing_tds <int>, interceptions <dbl>, sacks <dbl>,
#> # sack_yards <dbl>, sack_fumbles <int>, sack_fumbles_lost <int>,
#> # passing_air_yards <dbl>, passing_yards_after_catch <dbl>, …
# Defense
defenseStats_weekly %>%
group_by(player_id, season, week) %>%
filter(n() > 1)
#> # A tibble: 496 × 32
#> # Groups: player_id, season, week [183]
#> player_id season week season_type player_name player_display_name position
#> <chr> <int> <int> <chr> <chr> <chr> <chr>
#> 1 0 1999 1 REG <NA> <NA> <NA>
#> 2 0 1999 1 REG <NA> <NA> <NA>
#> 3 0 1999 1 REG <NA> <NA> <NA>
#> 4 0 1999 1 REG <NA> <NA> <NA>
#> 5 0 1999 1 REG <NA> <NA> <NA>
#> 6 0 1999 1 REG <NA> <NA> <NA>
#> 7 0 1999 1 REG <NA> <NA> <NA>
#> 8 0 1999 1 REG <NA> <NA> <NA>
#> 9 0 1999 2 REG <NA> <NA> <NA>
#> 10 0 1999 2 REG <NA> <NA> <NA>
#> # ℹ 486 more rows
#> # ℹ 25 more variables: position_group <chr>, headshot_url <chr>, team <chr>,
#> # def_tackles <int>, def_tackles_solo <int>, def_tackles_with_assist <int>,
#> # def_tackle_assists <int>, def_tackles_for_loss <int>,
#> # def_tackles_for_loss_yards <dbl>, def_fumbles_forced <int>,
#> # def_sacks <dbl>, def_sack_yards <dbl>, def_qb_hits <dbl>,
#> # def_interceptions <dbl>, def_interception_yards <dbl>, …
defenseStats_weekly %>%
group_by(player_id, season, week) %>%
filter(n() > 1, player_id != 0) #not sure why there are playerIDs of "0"; exclude them
#> # A tibble: 296 × 32
#> # Groups: player_id, season, week [148]
#> player_id season week season_type player_name player_display_name position
#> <chr> <int> <int> <chr> <chr> <chr> <chr>
#> 1 00-0002919 1999 4 REG <NA> Corey Chavous SS
#> 2 00-0002919 1999 4 REG <NA> Corey Chavous SS
#> 3 00-0004543 1999 12 REG <NA> Shane Dronett DT
#> 4 00-0004543 1999 12 REG <NA> Shane Dronett DT
#> 5 00-0004915 1999 16 REG <NA> Bobby Engram WR
#> 6 00-0004915 1999 16 REG <NA> Bobby Engram WR
#> 7 00-0010668 1999 20 POST <NA> Keenan McCardell WR
#> 8 00-0010668 1999 20 POST <NA> Keenan McCardell WR
#> 9 00-0011392 1999 14 REG <NA> Basil Mitchell RB
#> 10 00-0011392 1999 14 REG <NA> Basil Mitchell RB
#> # ℹ 286 more rows
#> # ℹ 25 more variables: position_group <chr>, headshot_url <chr>, team <chr>,
#> # def_tackles <int>, def_tackles_solo <int>, def_tackles_with_assist <int>,
#> # def_tackle_assists <int>, def_tackles_for_loss <int>,
#> # def_tackles_for_loss_yards <dbl>, def_fumbles_forced <int>,
#> # def_sacks <dbl>, def_sack_yards <dbl>, def_qb_hits <dbl>,
#> # def_interceptions <dbl>, def_interception_yards <dbl>, …
# Kicking
kickingStats_weekly %>%
group_by(player_id, season, week) %>%
filter(n() > 1)
#> # A tibble: 4 × 44
#> # Groups: player_id, season, week [2]
#> player_id season week season_type team player_name player_display_name
#> <chr> <int> <int> <chr> <chr> <chr> <chr>
#> 1 00-0004811 2000 11 REG DEN <NA> Jason Elam
#> 2 00-0004811 2000 11 REG LV <NA> Jason Elam
#> 3 00-0012875 2002 4 REG PIT <NA> Todd Peterson
#> 4 00-0012875 2002 4 REG PIT <NA> Todd Peterson
#> # ℹ 37 more variables: position <chr>, position_group <chr>,
#> # headshot_url <chr>, fg_made <int>, fg_att <dbl>, fg_missed <int>,
#> # fg_blocked <int>, fg_long <dbl>, fg_pct <dbl>, fg_made_0_19 <int>,
#> # fg_made_20_29 <int>, fg_made_30_39 <int>, fg_made_40_49 <int>,
#> # fg_made_50_59 <int>, fg_made_60_ <int>, fg_missed_0_19 <int>,
#> # fg_missed_20_29 <int>, fg_missed_30_39 <int>, fg_missed_40_49 <int>,
#> # fg_missed_50_59 <int>, fg_missed_60_ <int>, fg_made_list <chr>, …
sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=English_United States.utf8
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: America/Chicago
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 nflreadr_1.4.1
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 cli_3.6.3 knitr_1.48 rlang_1.1.4
#> [5] xfun_0.46 generics_0.1.3 data.table_1.15.4 glue_1.7.0
#> [9] htmltools_0.5.8.1 fansi_1.0.6 rmarkdown_2.27 evaluate_0.24.0
#> [13] tibble_3.2.1 fastmap_1.2.0 yaml_2.3.10 lifecycle_1.0.4
#> [17] memoise_2.0.1 compiler_4.3.1 fs_1.6.4 pkgconfig_2.0.3
#> [21] rstudioapi_0.16.0 digest_0.6.36 R6_2.5.1 tidyselect_1.2.1
#> [25] reprex_2.1.1 utf8_1.2.4 pillar_1.9.0 magrittr_2.0.3
#> [29] tools_4.3.1 withr_3.0.0 cachem_1.1.0
Created on 2024-07-31 with reprex v2.1.1
Expected Behavior
I expect each player (i.e., player_id
) to have only one row for a given season
-week
combination.
nflverse_sitrep
> nflreadr::nflverse_sitrep()
── System Info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• R version 4.3.1 (2023-06-16 ucrt) • Running under: Windows 11 x64 (build 22631)
── Package Status ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package installed cran dev behind
1 nfl4th 1.0.4 1.0.4 1.0.4.9002 dev
2 nflfastR 4.6.1 4.6.1 4.6.1.9010 dev
3 nflplotR 1.3.1 1.3.1 1.3.1
4 nflreadr 1.4.1 1.4.1 1.4.1.00
── Package Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• No options set for above packages
── Package Dependencies ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• askpass (1.2.0) • httr (1.4.7) • stringi (1.8.4)
• backports (1.5.0) • isoband (0.2.7) • stringr (1.5.1)
• base64enc (0.1-3) • janitor (2.2.0) • sys (3.4.2)
• bigD (0.2.0) • jquerylib (0.1.4) • tibble (3.2.1)
• bitops (1.0-8) • jsonlite (1.8.8) • tidyr (1.3.1)
• bslib (0.8.0) • juicyjuice (0.1.0) • tidyselect (1.2.1)
• cachem (1.1.0) • knitr (1.48) • timechange (0.3.0)
• cli (3.6.3) • labeling (0.4.3) • tinytex (0.52)
• colorspace (2.1-1) • lifecycle (1.0.4) • utf8 (1.2.4)
• commonmark (1.9.1) • listenv (0.9.1) • V8 (4.4.2)
• cpp11 (0.4.7) • lubridate (1.9.3) • vctrs (0.6.5)
• curl (5.2.1) • magick (2.8.4) • viridisLite (0.4.2)
• data.table (1.15.4) • magrittr (2.0.3) • withr (3.0.0)
• digest (0.6.36) • markdown (1.13) • xfun (0.46)
• dplyr (1.1.4) • Matrix (1.6-5) • xgboost (1.7.8.1)
• evaluate (0.24.0) • memoise (2.0.1) • xml2 (1.3.6)
• fansi (1.0.6) • mime (0.12) • yaml (2.3.10)
• farver (2.1.2) • munsell (0.5.1) • codetools (0.2-20)
• fastmap (1.2.0) • openssl (2.2.0) • compiler (4.3.1)
• fastrmodels (1.0.2) • parallelly (1.38.0) • graphics (4.3.1)
• fontawesome (0.5.2) • pillar (1.9.0) • grDevices (4.3.1)
• fs (1.6.4) • pkgconfig (2.0.3) • grid (4.3.1)
• furrr (0.3.1) • progressr (0.14.0) • lattice (0.22-6)
• future (1.34.0) • purrr (1.0.2) • MASS (7.3-60.0.1)
• generics (0.1.3) • R6 (2.5.1) • Matrix (1.6-5)
• ggpath (1.0.1) • rappdirs (0.3.3) • methods (4.3.1)
• ggplot2 (3.5.1) • RColorBrewer (1.1-3) • mgcv (1.9-1)
• globals (0.16.3) • Rcpp (1.0.13) • nlme (3.1-165)
• glue (1.7.0) • reactable (0.4.4) • parallel (4.3.1)
• gt (0.11.0) • reactR (0.6.0) • splines (4.3.1)
• gtable (0.3.5) • rlang (1.1.4) • stats (4.3.1)
• highr (0.11) • rmarkdown (2.27) • tools (4.3.1)
• hms (1.1.3) • sass (0.4.9) • utils (4.3.1)
• htmltools (0.5.8.1) • scales (1.3.0)
• htmlwidgets (1.6.4) • snakecase (0.11.1)
── Not Installed ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• nflseedR ()
• nflverse ()
Screenshots
No response
Additional context
No response