anytime icon indicating copy to clipboard operation
anytime copied to clipboard

R session crashes with nonsensical values on Windows

Open christophsax opened this issue 7 years ago • 37 comments

This crashes the R session on Windows:

anytime::anytime(c("2.343423423", "3.435435345"))

It works fine with useR = TRUE:

anytime::anytime(c("2.343423423", "3.435435345"), useR = TRUE)
# [1] NA NA

and also on other platforms.

Using R 3.5 on Windows 7, with latest anytime.

christophsax avatar May 12 '18 11:05 christophsax

Arrggh. If it changes with useR=TRUE then it is clearly the Boost parser, and there is likely little to nothing we can do :-/.

FWIW, on Linux,

R> anytime::anytime(c("2.343423423", "3.435435345"))
[1] NA NA
R> 

and I presume your macOS box is fine too.

eddelbuettel avatar May 12 '18 13:05 eddelbuettel

Try adding anytime::setDebug(1) to see how 'far' we get. Here:

R> anytime:::setDebug(1)
[1] TRUE
R> anytime::anytime(c("2.343423423", "3.435435345"))
before tests: 2.343423423
In: 2.343423423 out: 2.343423423 and 
s: 2.343423423 len: 11 res: 0
s:  len: 0 res: 0
One: 2.343423423 two: 
before parse: 2.343423423
before tests: 3.435435345
In: 3.435435345 out: 3.435435345 and 
s: 3.435435345 len: 11 res: 0
s:  len: 0 res: 0
One: 3.435435345 two: 
before parse: 3.435435345
[1] NA NA
R> 

eddelbuettel avatar May 12 '18 13:05 eddelbuettel

Crashing Example, next try

Sorry, I messed something up with the 'reproducible' example, because it is not. Here is a reproducible one.

This crashes on Win 7, I checked on two machines:

x <- c("1.26453318526932")
anytime::anytime(x)

The same machines produce an NA with useR=TRUE

x <- c("1.26453318526932")
anytime::anytime(x, useR=TRUE)
# [1] NA

However on the Mac (and probably on Linux, I didn't try) I get a result for with the Boost parser.

> x <- c("1.26453318526932")
> anytime::anytime(x)
[1] "5331-01-28 05:33:00 CET"

Debug

Now, for the debugging, I get on Windows:

> anytime:::setDebug(1)
[1] TRUE
> x <- "1.26453318526932"
> anytime::anytime(x)
before tests: 1.26453318526932
In: 1.26453318526932 out: 1.26453318526932 and 
s: 1.26453318526932 len: 16 res: 0
s:  len: 0 res: 0
One: 1.26453318526932 two: 
before parse: 1.26453318526932

which looks similar to the Mac, where it doesn't crash:

> anytime:::setDebug(1)
[1] TRUE
> x <- "1.26453318526932"
> anytime::anytime(x)
before tests: 1.26453318526932
In: 1.26453318526932 out: 1.26453318526932 and 
s: 1.26453318526932 len: 16 res: 0
s:  len: 0 res: 0
One: 1.26453318526932 two: 
before parse: 1.26453318526932
[1] "5331-01-28 05:33:00 CET"

christophsax avatar May 12 '18 14:05 christophsax

Thanks. I later realized the debug is not that useful -- that I just the string parsing.

Linux result is the same as the macOS result.

eddelbuettel avatar May 12 '18 14:05 eddelbuettel

R Session is also crashing in this case

anytime::anytime("9999-01-01")

With anytime:::setDebug(1) I get sinking output in a file

before tests: 9999-01-01
In: 9999-01-01 out: 9999-01-01 and 
s: 9999-01-01 len: 10 res: 0
s:  len: 0 res: 0
One: 9999-01-01 two: 
before parse: 9999-01-01

However, this is working

anytime::anytime("1990-02-03")
#> [1] "1990-02-03 CET"

is this related or should I open a new issue on this one ?

Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  French_France.1252          
#>  tz       Europe/Paris                
#>  date     2018-09-29
#> Packages -----------------------------------------------------------------
#>  package      * version date       source                               
#>  anytime        0.3.1.1 2018-09-26 Github (eddelbuettel/anytime@841d7fd)
#>  backports      1.1.2   2017-12-13 CRAN (R 3.5.0)                       
#>  base         * 3.5.1   2018-07-02 local                                
#>  compiler       3.5.1   2018-07-02 local                                
#>  datasets     * 3.5.1   2018-07-02 local                                
#>  devtools       1.13.6  2018-06-27 CRAN (R 3.5.1)                       
#>  digest         0.6.17  2018-09-12 CRAN (R 3.5.1)                       
#>  evaluate       0.11    2018-07-17 CRAN (R 3.5.1)                       
#>  graphics     * 3.5.1   2018-07-02 local                                
#>  grDevices    * 3.5.1   2018-07-02 local                                
#>  htmltools      0.3.6   2017-04-28 CRAN (R 3.5.1)                       
#>  knitr          1.20    2018-02-20 CRAN (R 3.5.1)                       
#>  magrittr       1.5     2014-11-22 CRAN (R 3.5.1)                       
#>  memoise        1.1.0   2017-04-21 CRAN (R 3.5.1)                       
#>  methods      * 3.5.1   2018-07-02 local                                
#>  RApiDatetime   0.0.3   2017-04-02 CRAN (R 3.5.0)                       
#>  Rcpp           0.12.18 2018-07-23 CRAN (R 3.5.1)                       
#>  rmarkdown      1.10    2018-06-11 CRAN (R 3.5.1)                       
#>  rprojroot      1.3-2   2018-01-03 CRAN (R 3.5.1)                       
#>  stats        * 3.5.1   2018-07-02 local                                
#>  stringi        1.2.4   2018-07-20 CRAN (R 3.5.1)                       
#>  stringr        1.3.1   2018-05-10 CRAN (R 3.5.1)                       
#>  tools          3.5.1   2018-07-02 local                                
#>  utils        * 3.5.1   2018-07-02 local                                
#>  withr          2.1.2   2018-03-15 CRAN (R 3.5.1)                       
#>  yaml           2.2.0   2018-07-25 CRAN (R 3.5.1)

cderv avatar Sep 29 '18 15:09 cderv

That was already documented in https://github.com/eddelbuettel/anytime/issues/78, and is likely also a Boost issue. Try with useR=TRUE.

eddelbuettel avatar Sep 29 '18 16:09 eddelbuettel

yes, It is working using RApiDatetime. It always surprising to have crash when using a tool. If the bug is in Boost there is nothing that can really be done I guess. Also, this is kind of an edge case to have data with year 9999. Thanks for the quick answer and all the great 📦

cderv avatar Sep 29 '18 16:09 cderv

I wish this was fixed, but given that it only happens on Windows ... We could report it to the Boost date_library maintainer. If you have a few minutes, I could even send you some old standalone code...

eddelbuettel avatar Sep 29 '18 16:09 eddelbuettel

Sorry for the delay. Yes you can send me some infos on this and I can see where to report it. However, I have never look at Boost code and I hope I can help

cderv avatar Sep 29 '18 20:09 cderv

I presume you have C++ compiler handly on Windows, at a mininum Rtools to work with R, and to install packages like this one from source, correct?

What I have done in the past may have required linking (as Boost is easily available on Linux), but let me see if I can cook something up just using Boost headers and eg the BH package just like anytime does.

eddelbuettel avatar Sep 29 '18 21:09 eddelbuettel

Yes it is correct. I have Rtools to build package from source.

cderv avatar Sep 29 '18 21:09 cderv

See program below can be build with just g++. You need to find where you have BH and set -I on your box. Here I can also rely on system headers and just dp:

edd@rob:/tmp$ g++ -Wall -o boostex boost_date_parser_cmdline.cpp
edd@rob:/tmp$ ./boostex "2018-09-10 10:11:12" "20180910"
Parsed ptime is 2018-Sep-10 10:11:12 -- seconds from epoch are 1536574272 from input 2018-09-10 10:11:12
Parsed ptime is 2018-Sep-10 00:00:00 -- seconds from epoch are 1536537600 from input 20180910
edd@rob:/tmp$ 

It should crash for you and show a Boost bug. If it doesn't then the issue is R interacting with Boost. Possibly, but now something I expect.


// cf http://stackoverflow.com/a/3787188/143305 and extended
#include <iostream>
#include <boost/date_time.hpp>

namespace bt = boost::posix_time;

const std::locale formats[] = {
    std::locale(std::locale::classic(),new bt::time_input_facet("%Y-%m-%d %H:%M:%S")),
    std::locale(std::locale::classic(),new bt::time_input_facet("%Y%m%d %H%M%S")),
    std::locale(std::locale::classic(),new bt::time_input_facet("%m-%d-%Y %H:%M:%S")),
    std::locale(std::locale::classic(),new bt::time_input_facet("%d.%m.%Y %H:%M:%S")),
    std::locale(std::locale::classic(),new bt::time_input_facet("%Y-%m-%d")),
    std::locale(std::locale::classic(),new bt::time_input_facet("%Y%m%d"))
};
const size_t formats_n = sizeof(formats)/sizeof(formats[0]);

std::time_t pt_to_time_t(const bt::ptime& pt) {
    bt::ptime timet_start(boost::gregorian::date(1970,1,1));
    bt::time_duration diff = pt - timet_start;
    return diff.ticks()/bt::time_duration::rep_type::ticks_per_second;
}

void seconds_from_epoch(const std::string& s) {
    bt::ptime pt, ptbase;
    for (size_t i=0; pt == ptbase && i < formats_n; ++i) {
        std::istringstream is(s);
        is.imbue(formats[i]);
        is >> pt;
    }
    if (pt == bt::ptime()) {
        std::cerr << "Parse error for " << s << '\n';
        exit(-1);
    }
    std::cout << "Parsed ptime is " << pt << " -- seconds from epoch are " 
              << pt_to_time_t(pt) << " from input " << s << '\n';
}

int main(int argc, char *argv[]) {
    if (argc < 2) {
        std::cerr << "Usage: " << argv[0] << " arg1 [arg2...]\n";
        exit(-1);
    }
    for (int i=1; i<argc; i++) {
        seconds_from_epoch(argv[i]);
    }
    exit(0);
}

eddelbuettel avatar Sep 29 '18 21:09 eddelbuettel

It works without any Boost bug

$g++ -Wall -IC:/Users/chris/Documents/R/win-library/3.5/BH/include -o boostex boost_date_parser_cmdline.cpp
$ ./boostex.exe "2018-09-10 10:11:12" "20180910"
Parsed ptime is 2018-Sep-10 10:11:12 -- seconds from epoch are 1536574272 from input 2018-09-10 10:11:12
Parsed ptime is 2018-Sep-10 00:00:00 -- seconds from epoch are 1536537600 from input 20180910

This means it is on the R side ? Something off during compilation when installing anytime with Rtools ?

cderv avatar Sep 30 '18 12:09 cderv

Interesting. And yes, something else creeps in from somewhere.

If you could, now please modify you modify your R package to log ie add some printf statements. The key of the conversion is the same. Maybe the issue is somewhere else, and we have a chance to get to it?

eddelbuettel avatar Sep 30 '18 13:09 eddelbuettel

Start with something like this from R side, setting the debug boolean in the C++ code:

R> library(anytime)
R> anytime:::setDebug(TRUE)
[1] TRUE
R> anydate("2018-09-30")
before tests: 2018-09-30
In: 2018-09-30 out: 2018-09-30 and 
s: 2018-09-30 len: 10 res: 0
s:  len: 0 res: 0
One: 2018-09-30 two: 
before parse: 2018-09-30
days 17804
[1] "2018-09-30"
R> 

Most of this, though, is from the string splitting before we call the Boost routine I suspected, maybe wrongly. But you see the pattern: try to add more conditional output like I had done to trace where it blows up. We may get to it...

eddelbuettel avatar Sep 30 '18 13:09 eddelbuettel

OK, I did add some lines like if (debug) Rcpp::Rcout << "some text" << std::endl; everywhere in the code of anytime.cpp, and I was able to find the line that makes R crash https://github.com/eddelbuettel/anytime/blob/841d7fd046aea953214d8e66fd90fb0b7e5a07dc/src/anytime.cpp#L172 If I put a debug print log just before and just after, the second one do not print. I think it means the error is in localAsTm->tm_isdst*60*60.

If I try to activate one line you have commented https://github.com/eddelbuettel/anytime/blob/841d7fd046aea953214d8e66fd90fb0b7e5a07dc/src/anytime.cpp#L162 In this case, the crash happens at this line.

Is this helping you identify what could be the cause ?

cderv avatar Sep 30 '18 16:09 cderv

That is awesome work, and quite possible! The tm_isdst is probably not set, or not accessible. If we ifdef that away things will not crash. They may be wrong, though.

So some thinking time required. When/where/how does this become invalid? Years after 2038 maybe?

(Fixed initial typo. 2038 is the end of the 'since epoch counter' under current variable sizes.)

eddelbuettel avatar Sep 30 '18 17:09 eddelbuettel

Ie could you test with some reasonable guesses? We year 9999 is bad. Can you triage down? Would 5000 work? 3000? 2038, as suspect?

eddelbuettel avatar Sep 30 '18 17:09 eddelbuettel

Sorry for the delay. I made a small for loops to test all the year. I can say for sure that the first year with error is 3002. 2038 is working fine;

The testing code:

years <- seq(3000, 9999, by = 1)

try_conversion <- function(years) {
    error <- FALSE
    for(year in years) {
        message("Year: ", year)
        res <- tryCatch(
            callr::r(function(year) anytime::anytime(paste0(year, "-01-01")), args = list(year = year)),
            error = function(cnd) {
                error <<- TRUE
                message("Error for year ", year)
            })
        if (error) {
                stop("Error found for year : ", year, call. = FALSE)
        }
    }
    "no error"
}

try_conversion(years)

cderv avatar Sep 30 '18 18:09 cderv

We are actually in the wrong issue ticket in a way. What about the values @christophsax reported above, ie

anytime::anytime(c("2.343423423", "3.435435345"))

Would be good to have the "smallest" possible set of exclusion. "On Windows" && year >= 3002 works foe me, but may need a little more on the other with years below "some" (to be found) value. Can you loop down too?

eddelbuettel avatar Sep 30 '18 18:09 eddelbuettel

About this issue ticket, i get this

anytime::anytime(c("2.343423423", "3.435435345"))
#> [1] NA NA

So it seems to work. About the loop, I tried from 2008 to 9999 currently. Error start at 3002. I will run the loop down too.

We are actually in the wrong issue ticket in a way.

Oh I forgot. Please open a new one if you think it is better for follow up.

cderv avatar Sep 30 '18 18:09 cderv

Please open a new one if you think it is better for follow up.

I guess the better way would have been to go there and reopen, but it's that we have thousands of people following. And for now we are getting (good!) work done here so let's keep at it.

As for the error @christophsax reported and you no longer see: good. One less error then? I have not made a BH release so it it a little strange that it blew up for him but not for you. Oh well. Maybe a different test. We are getting this "max year on Windows" sorted out. Maybe add a min year too.

eddelbuettel avatar Sep 30 '18 18:09 eddelbuettel

No error when I go down but NA value from year 1399.

cderv avatar Sep 30 '18 18:09 cderv

So currently we have something that already tries to avoid a seg.fault on Windows:

#if defined(_WIN32)
    if (totsec > 0) {           // on Windows, for dates before 1970-01-01: segfault
        dstadj = localAsTm->tm_isdst*60*60;
    }
#else
    dstadj = localAsTm->tm_isdst*60*60;
#endif

I guess I should try something like

#if defined(_WIN32)
    if (totsec > 0  &&                   // on Windows before 1970-01-01: segfault
       localAsTm->tm_year >= 1102)  {    // tm_year is year-1900, so year 3002 is it
        dstadj = localAsTm->tm_isdst*60*60;
    }
#else
    dstadj = localAsTm->tm_isdst*60*60;
#endif

Can you test that, please?

eddelbuettel avatar Sep 30 '18 18:09 eddelbuettel

Comparison is wrong, of course. < 1102:

#if defined(_WIN32)
    if (totsec > 0  &&                   // on Windows before 1970-01-01: segfault
       localAsTm->tm_year < 1102)  {     // tm_year is year-1900, so year 3002 is it
        dstadj = localAsTm->tm_isdst*60*60;
    }
#else
    dstadj = localAsTm->tm_isdst*60*60;
#endif

eddelbuettel avatar Sep 30 '18 19:09 eddelbuettel

I am currently trying but there still an error when I try. I'll keep looking where it comes from and if I am doing everything right.

cderv avatar Sep 30 '18 19:09 cderv

I built it here. There may have been a copy and paste error. Do you want me to push as branch?

eddelbuettel avatar Sep 30 '18 19:09 eddelbuettel

OK. I think localAsTm->tm_year is getting R crashed now. Is there a way to debug by looking inside localAsTm ? I tried if (debug) Rcpp::Rcout << "Before fix - tm_year: " << localAsTm->tm_year << std::endl; and this does not log before crash.

cderv avatar Sep 30 '18 19:09 cderv

(And I tested it on Windows per rhub. Something changed in R with tz info now progressing better so I need to check that -- but build is fine, and the new test I added for this is fine. I think :) One can hope.)

As for tm_year etc, man localtime for me says:

       Broken-down time is stored in the structure tm, which is defined in <time.h> as follows:

           struct tm {
               int tm_sec;    /* Seconds (0-60) */
               int tm_min;    /* Minutes (0-59) */
               int tm_hour;   /* Hours (0-23) */
               int tm_mday;   /* Day of the month (1-31) */
               int tm_mon;    /* Month (0-11) */
               int tm_year;   /* Year - 1900 */
               int tm_wday;   /* Day of the week (0-6, Sunday = 0) */
               int tm_yday;   /* Day in the year (0-365, 1 Jan = 0) */
               int tm_isdst;  /* Daylight saving time */
           };

You need to find the header file R uses on Windows and look. Appreciate all your diligence and help!!

eddelbuettel avatar Sep 30 '18 19:09 eddelbuettel

if there is a way for me to test the version you've build, I can try it and report. Also, I'll try to find the header file and tell you. Not sure what could be going on here but happy to help !

cderv avatar Sep 30 '18 19:09 cderv

I am currently tied up with something else but all I did so far (besides incrementing dev version and NEWS.Rd) was this:

diff --git a/src/anytime.cpp b/src/anytime.cpp
index d1a075b..157f13f 100644
--- a/src/anytime.cpp
+++ b/src/anytime.cpp
@@ -168,7 +168,8 @@ double ptToDouble(const bt::ptime & pt, const bool asDate=false) {
     // contain the additional DST adjustment
     double totsec = tdiff.total_microseconds()/1.0e6, dstadj = 0;
 #if defined(_WIN32)
-    if (totsec > 0) {           // on Windows, for dates before 1970-01-01: segfault
+    if (totsec > 0  &&                  // on Windows before 1970-01-01: segfault
+       localAsTm->tm_year < 1102) {     // tm_year is year-1900, so year 3002 is it
         dstadj = localAsTm->tm_isdst*60*60;
     }
 #else

I also added this test which git diff does not yet know about as the file tests/gh_issue_78.R isn't staged yet:

## https://github.com/eddelbuettel/anytime/issues/78 as well as
## https://github.com/eddelbuettel/anytime/issues/76

library(anytime)

anydate("3003-01-01")    # crashed on Windows before tm_isdst disable for post 3002 years

stopifnot(anydate("3003-01-01") == as.Date("3003-01-01"))

eddelbuettel avatar Sep 30 '18 19:09 eddelbuettel

This is what I also modified in anytime.cpp. I build then installed but I still have an error. 🤔

I found time.h header that I think R use in C:\Rtools\mingw_64\x86_64-w64-mingw32\include\time.h Not sure what to look for in it - I put the file in this gist

I'll leave it for now as I am not sure I can do more by myself. If you still need my help further, please tell me.

Thank you for your assistance helping me help you.

cderv avatar Sep 30 '18 20:09 cderv

We can let it rest. I will put this into a proper branch, try win-builder etc pp. Right now I know too little about your error (you gave no details) but sometimes ... sleep and a little distance helps.

We made huge leaps today. I think we got this. So a big Thank you! to you for your patient trials.

eddelbuettel avatar Sep 30 '18 20:09 eddelbuettel

But I still get Windows errors when testing (via winbuilder). This is not done.

Maybe I misunderstood you in terms of having run-time vs build-time errors. I think we are at run-time now.

eddelbuettel avatar Sep 30 '18 21:09 eddelbuettel

I added another commit to the new branch. It now passes for me on Windows (apart from a vignette issue which is unrelated).

Try this when you have a chance.

eddelbuettel avatar Oct 01 '18 00:10 eddelbuettel

Thanks for the branch - I tested it but there is still an issue.

In fact, the issue is from the beginning with anytime::anytime and not with anytime::anydate. At run time, when we try a big year like anytime::anytime("2999-01-01") it works but, then with something like anytime::anytime("3002-01-01"), it crashes.

When anytime::anydate is used, the faulty part of the code I linked in my previous comment is not pass through because asDate argument is TRUE and code returns before in ptToDouble https://github.com/eddelbuettel/anytime/blob/841d7fd046aea953214d8e66fd90fb0b7e5a07dc/src/anytime.cpp#L154-L157 I think this is why your tests are passing because tests/gh_issue_78.R is using anytime::anydate.

Also, I am a french computer, so default timezone is CET:

anytime:::getTZ()
#> [1] "Europe/Paris"

Sorry if all that was not clear from the start.

more tests

As time is count in seconds, I dig a little further to exact date for which there is a crash between 3001-01-01 and 3002-01-01.

With default timezone (Europe/Paris)

  • anytime::anytime("3001-01-01 08:59:59") this works
  • anytime::anytime("3001-01-01 09:00:00") this crashes

If I use asUTC=TRUE, no crash so ptToDoubleUTC in anytime.cpp is working fine.

If I force UTC for R with Sys.setenv(TZ="UTC"), as expected

  • anytime::anytime("3001-01-01 07:59:59") is ok
  • anytime::anytime("3001-01-01 08:00:00") makes R crash

It seems the datetime in seconds that causes crash is 32535244800. I tried to use anytime with numeric to see if it crashes too. anytime::anytime(32535244800) does not crash. So, it crashes when given as character but not as numeric.

I hope this new detailed tests can help find the part that makes R crash.

cderv avatar Oct 01 '18 06:10 cderv

That is a lot to chew over. Point taken re anytime() not anydate(). Will update the test.

anytime::anytime(32535244800) does not parse as text but it caught earlier and converted directly as a type, so it does not apply.

I will set the year comparison to 3000 then, ie 3000-1900 = 1100 as the cutoff.

eddelbuettel avatar Oct 01 '18 10:10 eddelbuettel