readr icon indicating copy to clipboard operation
readr copied to clipboard

Feature request: warning message for numeric columns larger than 2^52

Open huisunsh opened this issue 1 year ago • 0 comments

I know this issue has been raised before, for example, #976. When users call read_csv to import data without specifying the variable type of each column, read_csv does the smart trick and automatically identifies each column's variable type. All of these are wonderful, the only caveat is that numeric variables larger than 2^53-1 lost their precision.

> original_data <- tibble(project = c('10GS','23AS','11SG'), id = c(13101201211316084,13101200510130349,15103200910645447))
> write_csv(original_data, 'data.csv')
                                                                      
> data <- read_csv('data.csv')
Rows: 3 Columns: 2                                                      
── Column specification ────────────────────────────────────────────────
Delimiter: ","
chr (1): project
dbl (1): id

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> print(as.character(data$id))
[1] "13101201211316084" "13101200510130348" "15103200910645448"

As one may observe, the last digits of the output are different from the last digits of the original data.

A cautious coder will know how many digits the numeric variable requires and make a judgment to specify col_types when reading the file. But we are often not careful. And it can potentially cause some serious problems down the road.

Would it be possible to add a warning message during read_csv import?

huisunsh avatar Sep 27 '24 11:09 huisunsh