DMwR2 icon indicating copy to clipboard operation
DMwR2 copied to clipboard

sampleCSV not reading header

Open WarrenC opened this issue 7 years ago • 3 comments

I downloaded “flights14.csv” from the following site. https://github.com/arunsrinivasan/flights/wiki/NYC-Flights-2014-data

I am working on macOS High Sierra Version 10.13.3. I am using R version 3.5.0 (2018-04-23) -- "Joy in Playing” with DMwR2_0.0.2. I created “test2” as subdirectories in “Users” as an example. /Users/test2

The column names are not displayed in the tibble when I run the following line. Instead it seems to be using one of the rows of flight data for the column names. flights1000_2 <- sampleCSV(file = "/Users/test2/flights14.csv", percORn = 1000, header = T)

The following message is displayed in the console. Parsed with column specification: cols( 2014 = col_integer(), 1 = col_integer(), 1_1 = col_integer(), 847 = col_integer(), -3 = col_integer(), 1036 = col_integer(), 1_2 = col_integer(), 0 = col_integer(), AA = col_character(), N553AA = col_character(), 313 = col_integer(), LGA = col_character(), ORD = col_character(), 139 = col_integer(), 733 = col_integer(), 8 = col_integer(), 47 = col_integer() ) Warning message: Duplicated column names deduplicated: '1' => '1_1' [3], '1' => '1_2' [7]

WarrenC avatar Jul 12 '18 18:07 WarrenC

I'm experiencing the same problem.

smottaghinejad avatar Feb 13 '20 22:02 smottaghinejad

Same for me

CeliaZhu avatar May 23 '20 03:05 CeliaZhu

The error seems to be that the header may be sampled out of the temporary created file:

https://github.com/ltorgo/DMwR2/blob/c19cb08742040b245b1c5c03069ccbe5643aff72/R/utils.R#L213

danielmitre avatar Mar 15 '21 21:03 danielmitre