sampleCSV not reading header
I downloaded “flights14.csv” from the following site. https://github.com/arunsrinivasan/flights/wiki/NYC-Flights-2014-data
I am working on macOS High Sierra Version 10.13.3. I am using R version 3.5.0 (2018-04-23) -- "Joy in Playing” with DMwR2_0.0.2. I created “test2” as subdirectories in “Users” as an example. /Users/test2
The column names are not displayed in the tibble when I run the following line. Instead it seems to be using one of the rows of flight data for the column names. flights1000_2 <- sampleCSV(file = "/Users/test2/flights14.csv", percORn = 1000, header = T)
The following message is displayed in the console.
Parsed with column specification:
cols(
2014 = col_integer(),
1 = col_integer(),
1_1 = col_integer(),
847 = col_integer(),
-3 = col_integer(),
1036 = col_integer(),
1_2 = col_integer(),
0 = col_integer(),
AA = col_character(),
N553AA = col_character(),
313 = col_integer(),
LGA = col_character(),
ORD = col_character(),
139 = col_integer(),
733 = col_integer(),
8 = col_integer(),
47 = col_integer()
)
Warning message:
Duplicated column names deduplicated: '1' => '1_1' [3], '1' => '1_2' [7]
I'm experiencing the same problem.
Same for me
The error seems to be that the header may be sampled out of the temporary created file:
https://github.com/ltorgo/DMwR2/blob/c19cb08742040b245b1c5c03069ccbe5643aff72/R/utils.R#L213