readr icon indicating copy to clipboard operation
readr copied to clipboard

Writing `POSIXlt` columns loses timezones

Open tohka opened this issue 1 year ago • 3 comments

write_csv removes the time zone information from the POSIXlt values in tibble and appends Z to the output.

> version
               _                                
platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          2.2                              
year           2022                             
month          10                               
day            31                               
svn rev        83211                            
language       R                                
version.string R version 4.2.2 (2022-10-31 ucrt)
nickname       Innocent and Trusting            

> library(readr)
> packageVersion("readr")
[1] ‘2.1.4’
> library(tibble)
> packageVersion("tibble")
[1] ‘3.1.8’

> Sys.timezone()
[1] "Asia/Tokyo"
> dt <- "2000/01/01 09:00:00"
> dt.ct <- as.POSIXct(dt, tz=Sys.timezone())
> dt.ct
[1] "2000-01-01 09:00:00 JST"
> dt.lt <- as.POSIXlt(dt, tz=Sys.timezone())
> dt.lt
[1] "2000-01-01 09:00:00 JST"
> df <- data.frame(ct=dt.ct, lt=dt.lt)
> df
                   ct                  lt
1 2000-01-01 09:00:00 2000-01-01 09:00:00
> tbl <- tibble(ct=dt.ct, lt=dt.lt)
> tbl
# A tibble: 1 × 2
  ct                  lt                 
  <dttm>              <dttm>             
1 2000-01-01 09:00:00 2000-01-01 09:00:00

> write_csv(df, "write_csv_df.csv")
> readLines("write_csv_df.csv")
[1] "ct,lt"                                    
[2] "2000-01-01T00:00:00Z,2000-01-01T00:00:00Z"
> write_csv(tbl, "write_csv_tbl.csv")
> readLines("write_csv_tbl.csv")
[1] "ct,lt"                                    
[2] "2000-01-01T00:00:00Z,2000-01-01T09:00:00Z"

"2000-01-01 09:00:00 JST" equals "2000-01-01 00:00:00Z".

However, when using tibble, the POSIXlt value of "2000-01-01 09:00:00 JST" is output as "2000-01-01 09:00:00Z".

tohka avatar Feb 27 '23 14:02 tohka

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

hadley avatar Jul 31 '23 22:07 hadley

Hi @hadley

The reprex code is presented below.

version
#>                _                                
#> platform       x86_64-w64-mingw32               
#> arch           x86_64                           
#> os             mingw32                          
#> crt            ucrt                             
#> system         x86_64, mingw32                  
#> status                                          
#> major          4                                
#> minor          3.1                              
#> year           2023                             
#> month          06                               
#> day            16                               
#> svn rev        84548                            
#> language       R                                
#> version.string R version 4.3.1 (2023-06-16 ucrt)
#> nickname       Beagle Scouts

library(readr)
packageVersion("readr")
#> [1] '2.1.4'
library(tibble)
packageVersion("tibble")
#> [1] '3.2.1'

dt <- "2000/01/01 09:00:00"
tz <- "Asia/Tokyo"

(dt.ct <- as.POSIXct(dt, tz=tz))
#> [1] "2000-01-01 09:00:00 JST"
(dt.lt <- as.POSIXlt(dt, tz=tz))
#> [1] "2000-01-01 09:00:00 JST"

(df <- data.frame(ct=dt.ct, lt=dt.lt))
#>                    ct                  lt
#> 1 2000-01-01 09:00:00 2000-01-01 09:00:00
(tbl <- tibble(ct=dt.ct, lt=dt.lt))
#> # A tibble: 1 × 2
#>   ct                  lt                 
#>   <dttm>              <dttm>             
#> 1 2000-01-01 09:00:00 2000-01-01 09:00:00

write_csv(df, "write_csv_df.csv")
readLines("write_csv_df.csv")
#> [1] "ct,lt"                                    
#> [2] "2000-01-01T00:00:00Z,2000-01-01T00:00:00Z"

write_csv(tbl, "write_csv_tbl.csv")
readLines("write_csv_tbl.csv")
#> [1] "ct,lt"                                    
#> [2] "2000-01-01T00:00:00Z,2000-01-01T09:00:00Z"

Please let me know if there is any other information I am missing.

tohka avatar Aug 01 '23 14:08 tohka

Here's a somewhat more minimal reprex:

library(readr)
lt <- as.POSIXlt("2000/01/01 09:00:00", tz = "Asia/Tokyo")

df <- data.frame(lt = lt)
str(df)
#> 'data.frame':    1 obs. of  1 variable:
#>  $ lt: POSIXct, format: "2000-01-01 09:00:00"

tbl <- tibble::tibble(lt = lt)
str(tbl)
#> tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
#>  $ lt: POSIXlt[1:1], format: "2000-01-01 09:00:00"
cat(format_csv(tbl))
#> lt
#> 2000-01-01T09:00:00Z

Created on 2023-08-01 with reprex v2.0.2

There are two issues: data.frame() automatically turns POSIXlt to POSIXlt so there's no POSIXlt in the data frame example. So write_csv()/format_csv() always appears to lose timezones of POSIXlt variables.

hadley avatar Aug 01 '23 17:08 hadley