readr icon indicating copy to clipboard operation
readr copied to clipboard

Why does "1176413S03" get converted to numeric when using `readr::type_convert` ?

Open shahronak47 opened this issue 1 year ago • 5 comments

Why does these values get converted into numeric when using readr::type_convert ? I would expect them to stay characters.

x <- c("1176413S03", "1176413S06", "1176413S02", "1176413S08", "1176413S05", "1176413S04")
df <- data.frame(x)
str(df)

'data.frame':	6 obs. of  1 variable:
 $ x: chr  "1176413S03" "1176413S06" "1176413S02" "1176413S08" ...

df1 <- readr::type_convert(df)
str(df1)

'data.frame':	6 obs. of  1 variable:
 $ x: num  1.18e+09 1.18e+12 1.18e+08 1.18e+14 1.18e+11 ...

shahronak47 avatar Nov 14 '24 17:11 shahronak47

This behavior appears to happen only for the letters D, E, F, L & S (upper or lower case). E makes sense.

res <- purrr::map_chr(
  .x = LETTERS,
  .f = \(x) {
    d <- readr::type_convert(
      df = data.frame(
        x = glue::glue("1176413{letter}03",letter = x)
      )
    )
    class(d$x)
  }
)

setNames(res,LETTERS)

> setNames(res,LETTERS)
          A           B           C           D           E           F 
"character" "character" "character"   "numeric"   "numeric"   "numeric" 
          G           H           I           J           K           L 
"character" "character" "character" "character" "character"   "numeric" 
          M           N           O           P           Q           R 
"character" "character" "character" "character" "character" "character" 
          S           T           U           V           W           X 
  "numeric" "character" "character" "character" "character" "character" 
          Y           Z 
"character" "character" 

joranE avatar Nov 14 '24 19:11 joranE

https://github.com/tidyverse/readr/blob/96ddac314b47402bc63e1f81c149c463cf58e3da/src/QiParsers.h#L157-L181

And vroom doesn't use qiparser

chainsawriot avatar Jan 05 '25 11:01 chainsawriot

I think these may be for C floating point constants https://en.cppreference.com/w/c/language/floating_constant

e is exponent for decimal floating point p is exponent for hex floating point (rare) f is suffix for float l is suffix for long double (rare)

I don't know what s is, and a parser should really only be using e or no letters.

jxu avatar Mar 03 '25 19:03 jxu

At the very least it should be documented, both in ?type_convert and there should be at least some reference to this behavior in ?read_csv & friends as well.

joranE avatar Mar 03 '25 20:03 joranE

I agree, but looking at the state of the issue tracker and other reports listed I don't think it will be addressed anytime soon

jxu avatar Mar 04 '25 01:03 jxu