readr why does the value are all “NA” with using read_csv function, while it is right with using read.csv function

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

Brief description of the problem I read a csv file using the read_csv function, but all the values of the EV100/CI variable become null in the reading result, while using the read.csv function reads the file，the values are right. I have uploaded my data file， a csv format file，as an attachment.

# insert reprex here
dd<- read_csv("C:\\Users\\rootdata\\17.csv")
View(dd)
#while using the read.csv function can get right values.
ddt<- read.csv("C:\\Users\\rootdata\\17.csv")
View(ddt)

17.csv

Apr 06 '25 01:04 Chinesedoctor

The issue is the heuristic read_csv uses to infer column types. By default it inspects 1000 values, evenly spaced through the file. If there are many, many missing values, and in particular it seems that there is a general pattern of alternating values & NAs in that column, this can easily result in guessing that the column is entirely NA.

If you set guess_max = Inf the file will be read in correctly. (You could also specify the column types manually, but in practice that's often kind of a nuisance.)

The fact that read.csv doesn't do this makes it more "careful", but also much, much slower, as it (I think) re-coerces columns on the fly as many times as necessary.

In general, this is an area of readr::read_csv where I have found it does much worse (in a relative sense; obviously these edge cases are quite rare overall) than the approach of data.table::fread which looks at 10000 rows by default, in equally spaced chunks of 100, and have wondered why it didn't simply adopt that as a default.

Apr 06 '25 21:04 joranE

If you have a large number of variables and you do not know anything about, this approach could lead to data missining. Very dangerous. In a dataset of 4k raw, only the last 1k are not NA in one variable. The function readr::read_csv lost these values and the variable was all missing values. If you are not aware of this behaviour and set the parameter guess_max = Inf you lost the information at all. I think this approach is not safe.

Nov 21 '25 17:11 mnfuser