openWAR icon indicating copy to clipboard operation
openWAR copied to clipboard

Calculations for delta.pitch and delta.field differ from previous calculations

Open davidbmitchell opened this issue 8 years ago • 4 comments

I compared running makeWAR() on the May data set to the MayProcessed data set and noticed that the delta.field and delta.pitch columns in the New MayProcessed data set differed from the original MayProcessed data set. They actually look transposed which you can see below. I did this using dplyr 0.5.0, but I first noticed it when testing makeWAR() after refactoring for dplyr 0.7.0 .

>NewMayProcessed <- makeWAR(May)
>head(NewMayProcessed$openWARPlays[,c(1:5,16, 19:23)])
  batterId start1B start2B start3B pitcherId                         gameId      delta delta.field delta.pitch    delta.br  delta.bat
1   476704    <NA>    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.3789624          NA  0.37896244          NA  0.3789624
2   519083  476704    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.2055008 -0.04671768 -0.15878313  0.03238909 -0.2378899
3   452234    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.3296470          NA -0.32964703  0.04026076 -0.3699078
4   493316    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.2032371  0.11692098  0.08631608 -0.53123407  0.7344711
5   518626  493316    <NA>  476704    450351 gid_2013_05_01_anamlb_oakmlb_1  0.1956572          NA  0.19565721 -0.01790497  0.2135622
6   474384  518626  493316  476704    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.7097701 -0.36090191 -0.34886821  0.01234560 -0.7221157

> head(MayProcessed$openWARPlays[,c(1:5,16, 19:23)])
  batterId start1B start2B start3B pitcherId                         gameId      delta delta.field delta.pitch    delta.br  delta.bat
1   476704    <NA>    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.3789624          NA   0.3789624          NA  0.3789624
2   519083  476704    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.2055008  -0.1588469  -0.0466539  0.03238909 -0.2378899
3   452234    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.3296470          NA  -0.3296470  0.04026076 -0.3699078
4   493316    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.2032371   0.1169279   0.0863092 -0.53123407  0.7344711
5   518626  493316    <NA>  476704    450351 gid_2013_05_01_anamlb_oakmlb_1  0.1956572          NA   0.1956572 -0.01790497  0.2135622
6   474384  518626  493316  476704    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.7097701  -0.3487953  -0.3609748  0.01234560 -0.7221157

The original MayProcessed data set was added over 2 years ago, and there have been quite a few changes to makeWAR() since then. I imagine this happened when openWAR and dplyrized. I'm pretty sure it has to do with [Line 140].(https://github.com/beanumber/openWAR/blob/master/R/makeWAR.R#L140)

x$data <- mutate_(x$data, delta.pitch = ~ifelse(is.na(delta.field), delta, delta - delta.field))

So I guess it boils down to which data set is correct? Is it the original MayProcessed data set?

davidbmitchell avatar Jul 09 '17 04:07 davidbmitchell

I actually think this occurring in makeWARFielding, specifically Lines 365-366.

delta.field <- with(data, ifelse(endOuts == startOuts, 
                                  delta * p.hat, delta * (1 - p.hat)))

davidbmitchell avatar Jul 10 '17 14:07 davidbmitchell

OK, thanks, I will take a look. The dplyr update broke all of my other packages too!

beanumber avatar Jul 10 '17 17:07 beanumber

I bet its breaking a lot of packages in the R universe. Honestly, I don't think the tidyeval is all that tidy. It makes things a lot more convoluted, but I'm also just not used to it yet.

davidbmitchell avatar Jul 10 '17 20:07 davidbmitchell

I think I found where the change could have occurred. The commit to close issue #92 might have done it. Which makes me believe that the NewMayProcessed above is actually correct, but I'm still not 100% on it.

davidbmitchell avatar Jul 10 '17 20:07 davidbmitchell