GBM-tune icon indicating copy to clipboard operation
GBM-tune copied to clipboard

may you share link from where to get data please

Open Sandy4321 opened this issue 4 years ago • 8 comments

as usual great code and ideas but d0_train <- fread(paste0("/var/data/airline/",yr-1,".csv")) may you share link from where to get data please

Sandy4321 avatar May 24 '20 19:05 Sandy4321

is it this data? https://www.kaggle.com/usdot/flight-delays or https://www.kaggle.com/giovamata/airlinedelaycauses or https://openflights.org/data.html or https://www.stat.purdue.edu/~sguha/rhipe/doc/html/airline.html or http://web.mit.edu/airlinedata/www/default.html

Sandy4321 avatar May 24 '20 19:05 Sandy4321

here:

https://github.com/szilard/GBM-tune/blob/7047ae1d8a133aefcc338665ffbee3864cc529ff/1-train_test-same_yr/run-tuning.R#L9-L24

(see the commented out lines with wget for the URLs)

szilard avatar May 24 '20 20:05 szilard

Thanks for soon answer something goes wrong with this code may you please clarify what can be done ?

set.seed(123) for yr in 1990 1991; do Error: unexpected symbol in "for yr" wget http://stat-computing.org/dataexpo/2009/$yr.csv.bz2 Error: unexpected symbol in " wget http" bunzip2 $yr.csv.bz2 Error: object 'bunzip2' not found wget http://stat-computing.org/dataexpo/2009/$1990.csv.bz2 Error: unexpected symbol in "wget http" yr <- 1990 wget http://stat-computing.org/dataexpo/2009/$yr.csv.bz2 Error: unexpected symbol in "wget http"

install.packages("wget") WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/ Installing package into ‘C:/Users/sndr/Documents/R/win-library/3.5’ (as ‘lib’ is unspecified) Warning in install.packages : package ‘wget’ is not available (for R version 3.5.1)

install.packages("Rtools ") WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/ Installing package into ‘C:/Users/sndr/Documents/R/win-library/3.5’ (as ‘lib’ is unspecified) Warning in install.packages : package ‘Rtools ’ is not available (for R version 3.5.1)

install.packages("Rtools") WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/ Installing package into ‘C:/Users/sndr/Documents/R/win-library/3.5’ (as ‘lib’ is unspecified) Warning in install.packages : package ‘Rtools’ is not available (for R version 3.5.1)

Sandy4321 avatar May 25 '20 14:05 Sandy4321

well, that's not R code, it's a bash (unix) script

you can also just download those files manually, e.g. http://stat-computing.org/dataexpo/2009/1991.csv.bz2

szilard avatar May 25 '20 14:05 szilard

great thanks for soon answer but I get this image

and this

curl("http://stat-computing.org/dataexpo/2009/$yr.csv.bz2") A connection with
description "http://stat-computing.org/dataexpo/2009/$yr.csv.bz2" class "curl"
mode "r"
text "text"
opened "closed"
can read "yes"
can write "no"
yr [1] 1990

and this image

Sandy4321 avatar May 25 '20 16:05 Sandy4321

Yeah, I see. It seems the provider has deleted the data.

http://stat-computing.org/dataexpo/2009/the-data.html

You might be able to find a copy somewhere else, though.

szilard avatar May 25 '20 16:05 szilard

E.g. here: https://github.com/h2oai/h2o-2/wiki/Hacking-Airline-DataSet-with-H2O

Airlines all years 1987-2008: https://s3.amazonaws.com/h2o-airlines-unpacked/allyears.csv (12 GB)

though I'm not 100% sure it is exactly the same data (that is same rows and same columns).

szilard avatar May 25 '20 16:05 szilard

Szilard super thanks for help very kind of you will try to download this data

Sandy4321 avatar May 25 '20 16:05 Sandy4321