lubridate icon indicating copy to clipboard operation
lubridate copied to clipboard

inconsistent behavior of mdy()

Open dereksonderegger opened this issue 4 years ago • 4 comments

In a case where we have the month name not in the standard three letter abbreviation, the mdy() function performs differently when applied to a vector of date strings compared to a single string. In the following example, mdy() fails to find the %Om %d, %Y format when given the date all by itself, but when another character string is added to the function, it can find this format.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
lubridate.verbose = TRUE
options(lubridate.verbose = TRUE)
mdy(c('Feb 13, 2012'))
#>  1 parsed with %b %d, %Y
#> [1] "2012-02-13"
mdy(c('Sept 13, 1978'))
#> Warning: All formats failed to parse. No formats found.
#> [1] NA
mdy(c('Sept 13, 1978', 'Feb 13, 2012', 'Feb 13, 2012'))
#>  2 parsed with %b %d, %Y
#>  1 parsed with %Om %d, %Y
#> [1] "1978-09-13" "2012-02-13" "2012-02-13"

Created on 2020-03-12 by the reprex package (v0.3.0)

dereksonderegger avatar Mar 12 '20 17:03 dereksonderegger

The issue comes down to guess_format which is not able to detect the non-standard abreviation:

> guess_formats(c('Sept 13, 1978'), "mdy")
NULL
> guess_formats(c('Sept 13, 1978', 'Feb 13, 2012', 'Feb 13, 2012'), "mdy")
        Omdy         Omdy          mdy          mdy 
"%Om %d, %Y" "%Om %d, %Y"  "%b %d, %Y"  "%b %d, %Y" 

The non-standard abbrev is parsed by the internal parser, which for English would parse others as well.

This will be fixed once I have rewritten the parser to handle all locales without the reliance on strptime.

vspinu avatar Mar 12 '20 21:03 vspinu

I have found a similar problem, but I don't know if it is the same. I am new to GitHub and I wasn't sure if I should post this as a separate issue or not, so apologies in advance. The mdy() function fails to recognize strings with "mar" or "Mar" on some occasions. Similar to @dereksonderegger problem, this behavior is inconsistent.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
mdy('Mar 13, 1999')
#> Warning: All formats failed to parse. No formats found.
#> [1] NA
mdy('mar 13, 1999')
#> Warning: All formats failed to parse. No formats found.
#> [1] NA
dmy('09 mar 2050')
#> Warning: All formats failed to parse. No formats found.
#> [1] NA
dmy('01 Mar 2020')
#> Warning: All formats failed to parse. No formats found.
#> [1] NA
mdy('Mar 13, 1963','mar 10, 1920', 'apr 29, 2012', 'dec 16, 2012')
#> [1] "1963-03-13" "1920-03-10" "2012-04-29" "2012-12-16"

I know next to nothing about coding, so I don't know if this is an issue about the function itself or a problem with my system. I am using R version 3.6.3 (2020-02-29), Platform: x86_64-w64-mingw32/x64 (64-bit). Again, sorry for any mistakes I made. If additional information is required, please let me know.

pirx90 avatar Apr 25 '20 01:04 pirx90

@pirx90 which locale are you in?

It's probably the same issue. See also #881 and the workaround there for time being.

vspinu avatar Apr 30 '20 18:04 vspinu

@vspinu I'm using "Spanish Mexico".

pirx90 avatar May 03 '20 23:05 pirx90