aRpsDCA Significantly Different Results Based on R Version

First off I just want to say thank you for putting this out there, it's fantastic. Built a script to run in R 4.0.2 and it works great. I then put it into PowerBI and noticed that my answers were significantly different, like 30% lower and the matches weren't very accurate visually. I plopped it out of PowerBI and into R Studio and all of a sudden it was matching again without changing any code, which is when I realized PowerBI was using R 4.2.2 instead of 4.0.2. Now that I noticed this I can avoid issues going forward but do you have any idea why the version change would affect the output so significantly?

Feb 07 '23 01:02 mkw87

I'm not sure what might be causing that. The first place I'd check is probably to make 100% sure that the version of aRpsDCA is the same in both cases, and that the exact same data is being provided to the relevant functions.

It's possible that the functionality of nlminb was changed between R versions in a way that would alter fit results. I have been out of the R loop for a few years now, but I don't see any obvious smoking gun from a quick search.

Feb 07 '23 01:02 derrickturk

I'll look into the versions and see but the data is the exact same. When you pop it out of Power BI it creates the datasets for your in a temp file. At first I was dumbfounded but realized it had to be a version thing because everything else was identical. Thank you for the quick response!

Feb 07 '23 01:02 mkw87

By the way if I changed R studio to run on 4.2.2 it matched the PowerBI results, so it's something in aRpsDCA or nlminb.

Feb 07 '23 01:02 mkw87

That is just super weird... nlminb isn't a randomized algorithm as far as I know, and I don't see any changes to that code in the R stats version control history in the last ~8 years. To track this down would probably require a minimized example that reproduces the error, ideally without any proprietary data.

Feb 07 '23 01:02 derrickturk

Version Data.xlsx Here's anonymous public data as one example of where I observed the issue along with the output best fit gas rate and gas volume from each one. As I mentioned 4.0.2 is much more accurate.

Feb 07 '23 02:02 mkw87

I'm assuming the different results are in a best-overall-fit based on raw rate/time data (i.e. the best.fit function)? Can you run the following script (call it "check.R", but GitHub doesn't allow .R attachments, because it's not like this is a PROGRAMMING SITE OR ANYTHING) and post the entire output, under both versions? This can be done e.g. by source('check.R', echo=TRUE).

sessionInfo()
data <- read.delim(sep='\t', text='t\tq\n0\t15609\n0.235616438\t15609\n0.487671233\t17002\n0.734246575\t10167\n0.969863014\t7344\n1.221917808\t5256\n1.473972603\t4582\n1.583561644\t3959\n1.739726027\t4482\n1.915068493\t3705\n2.126027397\t3452\n2.375342466\t3637\n2.62739726\t3239\n2.824657534\t3517\n3.032876712\t2496\n3.282191781\t1729\n3.534246575\t1655\n\n')
best <- best.fit(q=data$q, t=data$t)
print(best)
best.h2e <- best.hyp2exp(q=data$q, t=data$t)
print(best.h2e)

Feb 07 '23 02:02 derrickturk

Well this is interesting:

4.0.2:

`> sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] magicaxis_2.0.10 rlang_0.4.6 sqldf_0.4-11 RSQLite_2.2.0 gsubfn_0.7
[6] proto_1.0.0 data.table_1.12.8 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0
[11] purrr_0.3.4 readr_1.3.1 tidyr_1.1.0 tibble_3.0.2 ggplot2_3.3.2
[16] tidyverse_1.3.0 readxl_1.3.1 aRpsDCA_1.1.1 RevoUtils_11.0.2 RevoUtilsMath_11.0.0

loaded via a namespace (and not attached): [1] httr_1.4.1 maps_3.3.0 bit64_0.9-7 jsonlite_1.7.0 viridisLite_0.3.0 modelr_0.1.8
[7] sm_2.2-5.6 assertthat_0.2.1 blob_1.2.1 cellranger_1.1.0 pillar_1.4.6 backports_1.1.7
[13] glue_1.4.1 chron_2.3-55 digest_0.6.25 RColorBrewer_1.1-2 rvest_0.3.5 colorspace_1.4-1
[19] htmltools_0.5.0 pkgconfig_2.0.3 broom_0.7.0 haven_2.3.1 scales_1.1.1 RANN_2.6.1
[25] pracma_2.2.9 generics_0.0.2 ellipsis_0.3.1 withr_2.2.0 lazyeval_0.2.2 cli_2.0.2
[31] magrittr_1.5 crayon_1.3.4 memoise_1.1.0 fs_1.4.2 fansi_0.4.1 MASS_7.3-51.6
[37] xml2_1.3.2 tools_4.0.2 hms_0.5.3 lifecycle_0.2.0 plotly_4.9.2.1 munsell_0.5.0
[43] reprex_0.3.0 plotrix_3.7-8 NISTunits_1.0.1 compiler_4.0.2 grid_4.0.2 rstudioapi_0.11
[49] htmlwidgets_1.5.1 celestial_1.4.6 tcltk_4.0.2 gtable_0.3.0 DBI_1.1.0 R6_2.3.0
[55] lubridate_1.7.9 bit_1.1-15.2 stringi_1.4.6 Rcpp_1.0.5 vctrs_0.3.1 mapproj_1.2.7
[61] dbplyr_1.4.4 tidyselect_1.1.0

data <- read.delim(sep='\t', text='t\tq\n0\t15609\n0.235616438\t15609\n0.487671233\t17002\n0.734246575\t10167\n0.969863014\t7344\n1.221917808\t5256\n1.473972603\t4582\n1.583561644\t3959\n1.739726027\t4482\n1.915068493\t3705\n2.126027397\t3452\n2.375342466\t3637\n2.62739726\t3239\n2.824657534\t3517\n3.032876712\t2496\n3.282191781\t1729\n3.534246575\t1655\n\n') best <- best.fit(q=data$q, t=data$t) print(best) $decline [1] "Arps hyperbolic decline: <qi = 17762.93, Di = 0.7742964, b = 0.04062814>"

$sse [1] 39702258

best.h2e <- best.hyp2exp(q=data$q, t=data$t) print(best.h2e) $decline [1] "Arps hyperbolic-to-exponential decline: <qi = 17762.93, Di = 0.7742965, b = 0.04062833, Df = 0.1>"

$sse [1] 39702258`

4.2.2:

source('check.R', echo=TRUE)

sessionInfo() R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C LC_TIME=English_United States.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] magicaxis_2.2.14 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.3 tidyr_1.3.0
[8] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2 rlang_1.0.6 sqldf_0.4-11 RSQLite_2.2.20 gsubfn_0.7
[15] proto_1.0.0 readxl_1.4.1 aRpsDCA_1.1.1

loaded via a namespace (and not attached): [1] httr_1.4.4 maps_3.4.1 bit64_4.0.5 jsonlite_1.8.4 modelr_0.1.10 sm_2.2-5.7.1
[7] assertthat_0.2.1 blob_1.2.3 googlesheets4_1.0.1 cellranger_1.1.0 pillar_1.8.1 backports_1.4.1
[13] glue_1.6.2 chron_2.3-58 rvest_1.0.3 colorspace_2.1-0 pkgconfig_2.0.3 broom_1.0.3
[19] haven_2.5.1 scales_1.2.1 RANN_2.6.1 tzdb_0.3.0 pracma_2.4.2 timechange_0.2.0
[25] googledrive_2.0.0 generics_0.1.3 ellipsis_0.3.2 cachem_1.0.6 withr_2.5.0 cli_3.6.0
[31] magrittr_2.0.3 crayon_1.5.2 memoise_2.0.1 fs_1.6.0 fansi_1.0.4 MASS_7.3-58.1
[37] xml2_1.3.3 tools_4.2.2 hms_1.1.2 gargle_1.3.0 lifecycle_1.0.3 munsell_0.5.0
[43] reprex_2.0.2 plotrix_3.8-2 NISTunits_1.0.1 compiler_4.2.2 grid_4.2.2 rstudioapi_0.14
[49] celestial_1.4.6 tcltk_4.2.2 gtable_0.3.1 DBI_1.1.3 R6_2.5.1 lubridate_1.9.1
[55] fastmap_1.1.0 bit_4.0.5 utf8_1.2.2 stringi_1.7.12 Rcpp_1.0.10 vctrs_0.5.2
[61] mapproj_1.2.11 dbplyr_2.3.0 tidyselect_1.2.0

data <- read.delim(sep='\t', text='t\tq\n0\t15609\n0.235616438\t15609\n0.487671233\t17002\n0.734246575\t10167\n0.969863014\t7344\n1.221917808\t5256\ .... [TRUNCATED]

best <- best.fit(q=data$q, t=data$t)

print(best) $decline [1] "Arps hyperbolic-to-exponential decline: <qi = 17762.93, Di = 0.7742965, b = 0.04062814, Df = 0.1>"

$sse [1] 39702258

best.h2e <- best.hyp2exp(q=data$q, t=data$t)

print(best.h2e) $decline [1] "Arps hyperbolic-to-exponential decline: <qi = 17762.93, Di = 0.7742965, b = 0.04062814, Df = 0.1>"

$sse [1] 39702258

Feb 07 '23 02:02 mkw87

Yes the difference is occurring using best.fit.from.Np.with.buildup but I'm plotting the outputs from all three methods to see the difference and it's not like there's a better option that it is ignoring.

Feb 07 '23 02:02 mkw87

The "with buildup" part is suggestive. That part of the code has been something of a hotspot for errors. Would you mind running the example script above again, but edited as follows?

library(aRpsDCA)
sessionInfo()
data <- read.delim(sep='\t', text='t\tq\tnp\n0\t15609\t0\n0.235616438\t15609\t1.34\n0.487671233\t17002\t2.91\n0.734246575\t10167\t3.82\n0.969863014\t7344\t4.45\n1.221917808\t5256\t4.94\n1.473972603\t4582\t5.36\n1.583561644\t3959\t5.52\n1.739726027\t4482\t5.77\n1.915068493\t3705\t6.01\n2.126027397\t3452\t6.28\n2.375342466\t3637\t6.61\n2.62739726\t3239\t6.9\n2.824657534\t3517\t7.16\n3.032876712\t2496\t7.35\n3.282191781\t1729\t7.5\n3.534246575\t1655\t7.66\n\n')
best.q <- best.fit.with.buildup(q=data$q, t=data$t)
print(best.q)
best.Np <- best.fit.from.Np.with.buildup(Np=data$np, t=data$t)
print(best.Np)

Feb 10 '23 17:02 derrickturk

For what it's worth, on my machine, I get: 4.0.2:

> library(aRpsDCA)

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] aRpsDCA_1.1.1

loaded via a namespace (and not attached):
[1] compiler_4.0.2

> data <- read.delim(sep='\t', text='t\tq\tnp\n0\t15609\t0\n0.235616438\t15609\t1.34\n0.487671233\t17002\t2.91\n0.734246575\t10167\t3.82\n0.969863014\ .... [TRUNCATED] 

> best.q <- best.fit.with.buildup(q=data$q, t=data$t)

> print(best.q)
$decline
[1] "Arps hyperbolic decline: <qi = 85010, Di = 7.013908, b = 0.7773333> with buildup: <initial rate = 15609, time to peak = 0.4876712, peak rate = 16023.42>"

$sse
[1] 6037814


> best.Np <- best.fit.from.Np.with.buildup(Np=data$np, t=data$t)

> print(best.Np)
$decline
[1] "Arps hyperbolic decline: <qi = 27.63629, Di = 10, b = 0.9849721> with buildup: <initial rate = 5.687209, time to peak = 0.3616438, peak rate = 5.919143>"

$sse
[1] 0.08332251

4.2.2:

> library(aRpsDCA)

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] aRpsDCA_1.1.1

loaded via a namespace (and not attached):
[1] compiler_4.2.2

> data <- read.delim(sep='\t', text='t\tq\tnp\n0\t15609\t0\n0.235616438\t15609\t1.34\n0.487671233\t17002\t2.91\n0.734246575\t10167\t3.82\n0.969863014\ .... [TRUNCATED] 

> best.q <- best.fit.with.buildup(q=data$q, t=data$t)

> print(best.q)
$decline
[1] "Arps hyperbolic-to-exponential decline: <qi = 85010, Di = 7.013882, b = 0.7773306, Df = 0.1> with buildup: <initial rate = 15609, time to peak = 0.4876712, peak rate = 16023.43>"

$sse
[1] 6037814


> best.Np <- best.fit.from.Np.with.buildup(Np=data$np, t=data$t)

> print(best.Np)
$decline
[1] "Arps hyperbolic-to-exponential decline: <qi = 27.63629, Di = 10, b = 0.9849721, Df = 0.1> with buildup: <initial rate = 5.687209, time to peak = 0.3616438, peak rate = 5.919143>"

$sse
[1] 0.08332251

I also get the same rate forecasts out of both environments for the resulting decline.

Feb 10 '23 18:02 derrickturk

I got the same results you posted.

Feb 10 '23 21:02 mkw87

That would seem to imply that the issue lies elsewhere. I think the next step would have to be to verify that the exact same data is being sent to the exact same functions in each environment, and arriving at a script that reliably reproduces the issue.

Feb 10 '23 22:02 derrickturk

aRpsDCA aRpsDCA copied to clipboard

Significantly Different Results Based on R Version

aRpsDCA
aRpsDCA copied to clipboard