More data sources
- [ ] http://academictorrents.com/browse.php
- [ ] http://archive.ics.uci.edu/ml/index.php
- [ ] http://atlas.ca.gov/download.html
- [ ] http://data.europa.eu/euodp/en/data/ EU Open Data Portal
- [ ] http://data.un.org/
- [ ] http://www.naturalearthdata.com/downloads/ - small, medium, large scale earth data
- [ ] http://www.ncaa.org/about/resources/research/shared-ncaa-research-data
- [ ] http://www.samhsa.gov/data/report/2015-2016-nsduh-state-estimates-individual-excel-and-csv-files-outcome
- [ ] https://cloud.google.com/bigquery/public-data/ Public Data by Google
- [ ] https://corgis-edu.github.io/corgis/csv/
- [ ] https://data.chhs.ca.gov/
- [ ] https://data.fivethirtyeight.com/ (datasets related to news and sports)
- [ ] https://data.gov/
- [ ] https://data.unicef.org/resources/resource-type/datasets/ Unicef Data Resources
- [ ] https://data.worldbank.org/
- [ ] https://datacatalog.worldbank.org/ data from world bank
- [ ] https://datasetsearch.research.google.com/ google public datasets
- [ ] https://freegisdata.rtwilson.com/ (GIS Datasets)
- [ ] https://gisgeography.com/free-satellite-imagery-data-list/ - list of Satellite Image sources
- [ ] https://github.com/BuzzFeedNews
- [ ] https://github.com/caesar0301/awesome-public-datasets
- [ ] https://github.com/datasets/world-cities
- [ ] https://github.com/emorisse/FBI-Hate-Crime-Statistics/tree/master/2013 There are two files, table13.csv and table14.csv
- [ ] https://github.com/turalus/encycloDB/tree/master/Countries
- [ ] https://hub.arcgis.com/pages/open-data - data is available in form of spreadsheets, shapefiles, GeoJSON, etc
- [ ] https://www.mrlc.gov/ The Multi-Resolution Land Characteristics (MRLC) consortium
- [ ] https://www.mrlc.gov/data
refhttps://www.mrlc.gov/nlcd2011.phpcitation: """ Preferred NLCD 2011 citation: Homer, C.G., Dewitz, J.A., Yang, L., Jin, S., Danielson, P., Xian, G., Coulston, J., Herold, N.D., Wickham, J.D., and Megown, K., 2015, Completion of the 2011 National Land Cover Database for the conterminous United States-Representing a decade of land cover change information. Photogrammetric Engineering and Remote Sensing, v. 81, no. 5, p. 345-354""" Also ref https://www.mrlc.gov/nlcd06_data.php - [ ] https://open.canada.ca/en/open-data Canada open data
- [ ] https://opendata.socrata.com/
- [ ] https://piktochart.com/blog/8-useful-databases-to-dig-for-data/(reference point)
- [ ] https://planet.openstreetmap.org/ - historical shapefiles, planet data, etc.
- [ ] https://registry.opendata.aws/
- [ ] https://uclspp.github.io/datasets/
- [ ] https://vectorsurv.org/
- [ ] https://wiki.dbpedia.org/develop/datasets Wikipedia Dump
- [ ] https://world.openfoodfacts.org/data
- [ ] https://www.dataone.org/ Data-One
- [ ] https://www.epa.gov/enviroatlas/forms/enviroatlas-data-download
- [ ] https://www.figure-eight.com/data-for-everyone/
- [ ] https://www.kaggle.com/datasets
- [ ] https://www.kaggle.com/mylesoneill/world-university-rankings/download
- [ ] https://www.ncdc.noaa.gov/cdo-web/datasets climat data
- [ ] https://www.nsf.gov/statistics/data.cfm, https://nsf.gov/statistics/herd/pub_data.cfm
- [ ] https://www.srs.fs.usda.gov/sustain/data/fia/ewplot/index.htm? (FIA plot-level)
- [ ] https://www.zillow.com/research/data/ housing data:
- [ ] Re-add eBird EOD Please refer to #1396 when working on this
- [ ] https://www.dol.gov/agencies/eta/foreign-labor/performance
- [ ] https://www.farmers.gov/data
Hey @henrykironde! Would love to start contributing, and I believe adding datasets might be a good place to start. Could I pick one up from the lot or would you be assigning any particular one?
Hi @pri1311, Feel free to pick any data source. Let me know in case you need any clarification.
Let me know in case you need any clarification.
I have added a simple dataset as of now to get a basic idea of the repository. If the PR is merged/approved, will move on to more datasets. I am particularly interested in a separate open issue - Adding support for sequence data.
Also, I had one small doubt. I was going through some of the json files in the retriever-recipes repository. A lot of the Kaggle datasets were included. But since Kaggle allows downloading test and train data all at once as a zip file, how will those be added to this package? (Since I saw Kaggle mentioned as one of the data sources here.)
@pri1311 for sequence data, I have not found suitable sources yet, but you can go fo it.
since Kaggle allows downloading test and train data all at once as a zip file,
That is a good case since we download all the data using one url. We then extract all the files or we can extract a particular file. Checkout the Json files with extract for some examples. https://github.com/weecology/retriever-recipes/search?q=extract.
Let me know incase you have more issues or need clarification.