aRpsDCA
aRpsDCA copied to clipboard
Significantly Different Results Based on R Version
First off I just want to say thank you for putting this out there, it's fantastic. Built a script to run in R 4.0.2 and it works great. I then put it into PowerBI and noticed that my answers were significantly different, like 30% lower and the matches weren't very accurate visually. I plopped it out of PowerBI and into R Studio and all of a sudden it was matching again without changing any code, which is when I realized PowerBI was using R 4.2.2 instead of 4.0.2. Now that I noticed this I can avoid issues going forward but do you have any idea why the version change would affect the output so significantly?
I'm not sure what might be causing that. The first place I'd check is probably to make 100% sure that the version of aRpsDCA is the same in both cases, and that the exact same data is being provided to the relevant functions.
It's possible that the functionality of nlminb
was changed between R versions in a way that would alter fit results. I have been out of the R loop for a few years now, but I don't see any obvious smoking gun from a quick search.
I'll look into the versions and see but the data is the exact same. When you pop it out of Power BI it creates the datasets for your in a temp file. At first I was dumbfounded but realized it had to be a version thing because everything else was identical. Thank you for the quick response!
By the way if I changed R studio to run on 4.2.2 it matched the PowerBI results, so it's something in aRpsDCA or nlminb.
That is just super weird... nlminb
isn't a randomized algorithm as far as I know, and I don't see any changes to that code in the R stats
version control history in the last ~8 years. To track this down would probably require a minimized example that reproduces the error, ideally without any proprietary data.
Version Data.xlsx Here's anonymous public data as one example of where I observed the issue along with the output best fit gas rate and gas volume from each one. As I mentioned 4.0.2 is much more accurate.
I'm assuming the different results are in a best-overall-fit based on raw rate/time data (i.e. the best.fit
function)? Can you run the following script (call it "check.R", but GitHub doesn't allow .R attachments, because it's not like this is a PROGRAMMING SITE OR ANYTHING) and post the entire output, under both versions? This can be done e.g. by source('check.R', echo=TRUE)
.
sessionInfo()
data <- read.delim(sep='\t', text='t\tq\n0\t15609\n0.235616438\t15609\n0.487671233\t17002\n0.734246575\t10167\n0.969863014\t7344\n1.221917808\t5256\n1.473972603\t4582\n1.583561644\t3959\n1.739726027\t4482\n1.915068493\t3705\n2.126027397\t3452\n2.375342466\t3637\n2.62739726\t3239\n2.824657534\t3517\n3.032876712\t2496\n3.282191781\t1729\n3.534246575\t1655\n\n')
best <- best.fit(q=data$q, t=data$t)
print(best)
best.h2e <- best.hyp2exp(q=data$q, t=data$t)
print(best.h2e)
Well this is interesting:
4.0.2:
`> sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] magicaxis_2.0.10 rlang_0.4.6 sqldf_0.4-11 RSQLite_2.2.0 gsubfn_0.7
[6] proto_1.0.0 data.table_1.12.8 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0
[11] purrr_0.3.4 readr_1.3.1 tidyr_1.1.0 tibble_3.0.2 ggplot2_3.3.2
[16] tidyverse_1.3.0 readxl_1.3.1 aRpsDCA_1.1.1 RevoUtils_11.0.2 RevoUtilsMath_11.0.0
loaded via a namespace (and not attached):
[1] httr_1.4.1 maps_3.3.0 bit64_0.9-7 jsonlite_1.7.0 viridisLite_0.3.0 modelr_0.1.8
[7] sm_2.2-5.6 assertthat_0.2.1 blob_1.2.1 cellranger_1.1.0 pillar_1.4.6 backports_1.1.7
[13] glue_1.4.1 chron_2.3-55 digest_0.6.25 RColorBrewer_1.1-2 rvest_0.3.5 colorspace_1.4-1
[19] htmltools_0.5.0 pkgconfig_2.0.3 broom_0.7.0 haven_2.3.1 scales_1.1.1 RANN_2.6.1
[25] pracma_2.2.9 generics_0.0.2 ellipsis_0.3.1 withr_2.2.0 lazyeval_0.2.2 cli_2.0.2
[31] magrittr_1.5 crayon_1.3.4 memoise_1.1.0 fs_1.4.2 fansi_0.4.1 MASS_7.3-51.6
[37] xml2_1.3.2 tools_4.0.2 hms_0.5.3 lifecycle_0.2.0 plotly_4.9.2.1 munsell_0.5.0
[43] reprex_0.3.0 plotrix_3.7-8 NISTunits_1.0.1 compiler_4.0.2 grid_4.0.2 rstudioapi_0.11
[49] htmlwidgets_1.5.1 celestial_1.4.6 tcltk_4.0.2 gtable_0.3.0 DBI_1.1.0 R6_2.3.0
[55] lubridate_1.7.9 bit_1.1-15.2 stringi_1.4.6 Rcpp_1.0.5 vctrs_0.3.1 mapproj_1.2.7
[61] dbplyr_1.4.4 tidyselect_1.1.0
data <- read.delim(sep='\t', text='t\tq\n0\t15609\n0.235616438\t15609\n0.487671233\t17002\n0.734246575\t10167\n0.969863014\t7344\n1.221917808\t5256\n1.473972603\t4582\n1.583561644\t3959\n1.739726027\t4482\n1.915068493\t3705\n2.126027397\t3452\n2.375342466\t3637\n2.62739726\t3239\n2.824657534\t3517\n3.032876712\t2496\n3.282191781\t1729\n3.534246575\t1655\n\n') best <- best.fit(q=data$q, t=data$t) print(best) $decline [1] "Arps hyperbolic decline: <qi = 17762.93, Di = 0.7742964, b = 0.04062814>"
$sse [1] 39702258
best.h2e <- best.hyp2exp(q=data$q, t=data$t) print(best.h2e) $decline [1] "Arps hyperbolic-to-exponential decline: <qi = 17762.93, Di = 0.7742965, b = 0.04062833, Df = 0.1>"
$sse [1] 39702258`
4.2.2:
source('check.R', echo=TRUE)
sessionInfo() R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C LC_TIME=English_United States.utf8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] magicaxis_2.2.14 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.3 tidyr_1.3.0
[8] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2 rlang_1.0.6 sqldf_0.4-11 RSQLite_2.2.20 gsubfn_0.7
[15] proto_1.0.0 readxl_1.4.1 aRpsDCA_1.1.1
loaded via a namespace (and not attached):
[1] httr_1.4.4 maps_3.4.1 bit64_4.0.5 jsonlite_1.8.4 modelr_0.1.10 sm_2.2-5.7.1
[7] assertthat_0.2.1 blob_1.2.3 googlesheets4_1.0.1 cellranger_1.1.0 pillar_1.8.1 backports_1.4.1
[13] glue_1.6.2 chron_2.3-58 rvest_1.0.3 colorspace_2.1-0 pkgconfig_2.0.3 broom_1.0.3
[19] haven_2.5.1 scales_1.2.1 RANN_2.6.1 tzdb_0.3.0 pracma_2.4.2 timechange_0.2.0
[25] googledrive_2.0.0 generics_0.1.3 ellipsis_0.3.2 cachem_1.0.6 withr_2.5.0 cli_3.6.0
[31] magrittr_2.0.3 crayon_1.5.2 memoise_2.0.1 fs_1.6.0 fansi_1.0.4 MASS_7.3-58.1
[37] xml2_1.3.3 tools_4.2.2 hms_1.1.2 gargle_1.3.0 lifecycle_1.0.3 munsell_0.5.0
[43] reprex_2.0.2 plotrix_3.8-2 NISTunits_1.0.1 compiler_4.2.2 grid_4.2.2 rstudioapi_0.14
[49] celestial_1.4.6 tcltk_4.2.2 gtable_0.3.1 DBI_1.1.3 R6_2.5.1 lubridate_1.9.1
[55] fastmap_1.1.0 bit_4.0.5 utf8_1.2.2 stringi_1.7.12 Rcpp_1.0.10 vctrs_0.5.2
[61] mapproj_1.2.11 dbplyr_2.3.0 tidyselect_1.2.0
data <- read.delim(sep='\t', text='t\tq\n0\t15609\n0.235616438\t15609\n0.487671233\t17002\n0.734246575\t10167\n0.969863014\t7344\n1.221917808\t5256\ .... [TRUNCATED]
best <- best.fit(q=data$q, t=data$t)
print(best) $decline [1] "Arps hyperbolic-to-exponential decline: <qi = 17762.93, Di = 0.7742965, b = 0.04062814, Df = 0.1>"
$sse [1] 39702258
best.h2e <- best.hyp2exp(q=data$q, t=data$t)
print(best.h2e) $decline [1] "Arps hyperbolic-to-exponential decline: <qi = 17762.93, Di = 0.7742965, b = 0.04062814, Df = 0.1>"
$sse [1] 39702258
Yes the difference is occurring using best.fit.from.Np.with.buildup but I'm plotting the outputs from all three methods to see the difference and it's not like there's a better option that it is ignoring.
The "with buildup" part is suggestive. That part of the code has been something of a hotspot for errors. Would you mind running the example script above again, but edited as follows?
library(aRpsDCA)
sessionInfo()
data <- read.delim(sep='\t', text='t\tq\tnp\n0\t15609\t0\n0.235616438\t15609\t1.34\n0.487671233\t17002\t2.91\n0.734246575\t10167\t3.82\n0.969863014\t7344\t4.45\n1.221917808\t5256\t4.94\n1.473972603\t4582\t5.36\n1.583561644\t3959\t5.52\n1.739726027\t4482\t5.77\n1.915068493\t3705\t6.01\n2.126027397\t3452\t6.28\n2.375342466\t3637\t6.61\n2.62739726\t3239\t6.9\n2.824657534\t3517\t7.16\n3.032876712\t2496\t7.35\n3.282191781\t1729\t7.5\n3.534246575\t1655\t7.66\n\n')
best.q <- best.fit.with.buildup(q=data$q, t=data$t)
print(best.q)
best.Np <- best.fit.from.Np.with.buildup(Np=data$np, t=data$t)
print(best.Np)
For what it's worth, on my machine, I get: 4.0.2:
> library(aRpsDCA)
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] aRpsDCA_1.1.1
loaded via a namespace (and not attached):
[1] compiler_4.0.2
> data <- read.delim(sep='\t', text='t\tq\tnp\n0\t15609\t0\n0.235616438\t15609\t1.34\n0.487671233\t17002\t2.91\n0.734246575\t10167\t3.82\n0.969863014\ .... [TRUNCATED]
> best.q <- best.fit.with.buildup(q=data$q, t=data$t)
> print(best.q)
$decline
[1] "Arps hyperbolic decline: <qi = 85010, Di = 7.013908, b = 0.7773333> with buildup: <initial rate = 15609, time to peak = 0.4876712, peak rate = 16023.42>"
$sse
[1] 6037814
> best.Np <- best.fit.from.Np.with.buildup(Np=data$np, t=data$t)
> print(best.Np)
$decline
[1] "Arps hyperbolic decline: <qi = 27.63629, Di = 10, b = 0.9849721> with buildup: <initial rate = 5.687209, time to peak = 0.3616438, peak rate = 5.919143>"
$sse
[1] 0.08332251
4.2.2:
> library(aRpsDCA)
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] aRpsDCA_1.1.1
loaded via a namespace (and not attached):
[1] compiler_4.2.2
> data <- read.delim(sep='\t', text='t\tq\tnp\n0\t15609\t0\n0.235616438\t15609\t1.34\n0.487671233\t17002\t2.91\n0.734246575\t10167\t3.82\n0.969863014\ .... [TRUNCATED]
> best.q <- best.fit.with.buildup(q=data$q, t=data$t)
> print(best.q)
$decline
[1] "Arps hyperbolic-to-exponential decline: <qi = 85010, Di = 7.013882, b = 0.7773306, Df = 0.1> with buildup: <initial rate = 15609, time to peak = 0.4876712, peak rate = 16023.43>"
$sse
[1] 6037814
> best.Np <- best.fit.from.Np.with.buildup(Np=data$np, t=data$t)
> print(best.Np)
$decline
[1] "Arps hyperbolic-to-exponential decline: <qi = 27.63629, Di = 10, b = 0.9849721, Df = 0.1> with buildup: <initial rate = 5.687209, time to peak = 0.3616438, peak rate = 5.919143>"
$sse
[1] 0.08332251
I also get the same rate forecasts out of both environments for the resulting decline.
I got the same results you posted.
That would seem to imply that the issue lies elsewhere. I think the next step would have to be to verify that the exact same data is being sent to the exact same functions in each environment, and arriving at a script that reliably reproduces the issue.