mapvizieR icon indicating copy to clipboard operation
mapvizieR copied to clipboard

mapvizieR function parsing some dates incorrectly

Open chrishaid opened this issue 7 years ago • 2 comments

I've got dates such 01/12/2017 (January 12th, 2017) getting parsed as 2017-12-01 (Dec 1st, 2017). But most dates are fine. Our dates come in as text fields and are unadultered from how NWEA puts them in the CDF

This bit of code (i.e., the munge_startdate() function) is the culprit. From the docs:

When several format-orders are specified parse_date_time sorts the supplied format-orders based on a training set and then applies them recursively on the input vector.

I get s sense it's guessing wrong on the those ambiguous dates. Like, what is this so called "training set"?

Possible remedy we sample the teststartdate column, infer the format for each sampled date, then set the date order fromat (ymd, mdy, etc) to be used by the parser explicitly by taking the most common format in the sample?

Thoughts @almartin82?

chrishaid avatar May 17 '17 17:05 chrishaid

Does MAP always do this consistently? Or (ugh) does it change over cdfs?

On May 17, 2017 1:04 PM, "Chris Haid" [email protected] wrote:

I've got dates such 01/12/2017 (January 12th, 2017) getting parsed as 2017-12-01 (Dec 1st, 2017). But most dates are fine. Our dates come in as text fields and are unadultered from how NWEA puts them in the CDF

This bit of code https://github.com/almartin82/mapvizieR/blob/551c01fc7c9ac10ff4fdcb1c987dbd9484e0bfd6/R/util.R#L441-L445 (i.e., the munge_start_date() function) is the culprit. from the docs:

When several format-orders are specified parse_date_time sorts the supplied format-orders based on a training set and then applies them recursively on the input vector.

I get s sense it's guessing wrong on the those ambiguous dates.

Possible remedy we sample the teststartdate column, infer the format for each sampled date, then set the date order fromat (ymd, mdy, etc) to be used by the parser explicitly by taking the most common format in the sample?

Thoughts @almartin82 https://github.com/almartin82?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/almartin82/mapvizieR/issues/321, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvvN4mdPE_N0AtSGpYQ7hM3EhqmMZPSks5r6yingaJpZM4NeLKn .

almartin82 avatar May 17 '17 17:05 almartin82

It's consistent across CDFs. My guess is that when you pull data down from your DB and you stored it as a date in your DB, then you are getting ymd back. Which makes sense.

Literally the problem was with two ambiguous dates out of hundreds.

so in my case, say you sample 20 dates. All 20 will mdy, but 19 might be parsed that way, with one ambigous date (12/01/2017) getting parsed as dmy. But a vote of the sample 19-1 for mdy would then set the parser to explicitely used the mdy ordering for every instance.

Does that makes sense?

chrishaid avatar May 17 '17 17:05 chrishaid