awesome-public-datasets
awesome-public-datasets copied to clipboard
An awesome list of high-quality open datasets in public domains (on-going).
Public Datasets
This list of public data sources was originally compiled here <https://github.com/caesar0301/awesome-public-datasets>
_.
Sources are collected and tidied from blogs, answers, and user responses.
Most of the data sets listed below are free, however, some are not.
.. contents:: Table of Contents
Agriculture
-
U.S. Department of Agriculture's PLANTS Database <http://www.plants.usda.gov/dl_all.html>
_
Biology
-
1000 Genomes <http://www.1000genomes.org/data>
_ -
American Gut (Microbiome Project) <https://github.com/biocore/American-Gut>
_ -
Broad Cancer Cell Line Encyclopedia (CCLE) <http://www.broadinstitute.org/ccle/home>
_ -
Broad Bioimage Benchmark Collection (BBBC) <https://www.broadinstitute.org/bbbc>
_ -
Cell Image Library <http://www.cellimagelibrary.org>
_ -
Complete Genomics Public Data <http://www.completegenomics.com/public-data/69-genomes/>
_ -
EBI ArrayExpress <http://www.ebi.ac.uk/arrayexpress/>
_ -
EBI Protein Data Bank in Europe <http://www.ebi.ac.uk/pdbe/emdb/index.html/>
_ -
Electron Microscopy Pilot Image Archive (EMPIAR) <http://www.ebi.ac.uk/pdbe/emdb/empiar/>
_ -
ENCODE project <https://www.encodeproject.org>
_ -
Ensembl Genomes <http://ensemblgenomes.org/info/genomes>
_ -
Gene Expression Omnibus (GEO) <http://www.ncbi.nlm.nih.gov/geo/>
_ -
Gene Ontology (GO) <http://geneontology.org/page/download-annotations>
_ -
Global Biotic Interactions (GloBI) <https://github.com/jhpoelen/eol-globi-data/wiki#accessing-species-interaction-data>
_ -
Harvard Medical School (HMS) LINCS Project <http://lincs.hms.harvard.edu>
_ -
Human Genome Diversity Project <http://www.hagsc.org/hgdp/files.html>
_ -
Human Microbiome Project (HMP) <http://www.hmpdacc.org/reference_genomes/reference_genomes.php>
_ -
ICOS PSP Benchmark <http://ico2s.org/datasets/psp_benchmark.html>
_ -
International HapMap Project <http://hapmap.ncbi.nlm.nih.gov/downloads/index.html.en>
_ -
Journal of Cell Biology DataViewer <http://jcb-dataviewer.rupress.org>
_ -
MIT Cancer Genomics Data <http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi>
_ -
NCBI Proteins <http://www.ncbi.nlm.nih.gov/guide/proteins/#databases>
_ -
NCBI Taxonomy <http://www.ncbi.nlm.nih.gov/taxonomy>
_ -
NIH Microarray data <http://bit.do/VVW6>
_ orFTP <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/>
_ (see FTP link onRAW <https://raw.githubusercontent.com/caesar0301/awesome-public-datasets/master/README.rst>
_) -
OpenSNP genotypes data <https://opensnp.org/>
_ -
Pathguid - Protein-Protein Interactions Catalog <http://www.pathguide.org/>
_ -
Protein Data Bank <http://www.rcsb.org/>
_ -
Psychiatric Genomics Consortium <https://www.med.unc.edu/pgc/downloads>
_ -
PubChem Project <https://pubchem.ncbi.nlm.nih.gov/>
_ -
PubGene (now Coremine Medical) <http://www.pubgene.org/>
_ -
Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) <http://cancer.sanger.ac.uk/cosmic>
_ -
Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) <http://www.cancerrxgene.org/>
_ -
Sequence Read Archive(SRA) <http://www.ncbi.nlm.nih.gov/Traces/sra/>
_ -
Stanford Microarray Data <http://smd.stanford.edu/>
_ -
Stowers Institute Original Data Repository <http://www.stowers.org/research/publications/odr>
_ -
Systems Science of Biological Dynamics (SSBD) Database <http://ssbd.qbic.riken.jp>
_ -
The Cancer Genome Atlas (TCGA), available via Broad GDAC <https://gdac.broadinstitute.org/>
_ -
The Catalogue of Life <http://www.catalogueoflife.org/content/annual-checklist-archive>
_ -
The Personal Genome Project <http://www.personalgenomes.org/>
_ orPGP <https://my.pgp-hms.org/public_genetic_data>
_ -
UCSC Public Data <http://hgdownload.soe.ucsc.edu/downloads.html>
_ -
Universal Protein Resource (UnitProt) <http://www.uniprot.org/downloads>
_ -
UniGene <http://www.ncbi.nlm.nih.gov/unigene>
_
Climate/Weather
-
Australian Weather <http://www.bom.gov.au/climate/dwo/>
_ -
Aviation Weather Center - Consistent, timely and accurate weather information for the world airspace system <https://aviationweather.gov/adds/dataserver>
_ -
Brazilian Weather - Historical data (In Portuguese) <http://sinda.crn2.inpe.br/PCD/SITE/novo/site/>
_ -
Canadian Meteorological Centre <http://weather.gc.ca/grib/index_e.html>
_ -
Climate Data from UEA (updated monthly) <https://crudata.uea.ac.uk/cru/data/temperature/#datter and ftp://ftp.cmdl.noaa.gov/>
_ -
European Climate Assessment & Dataset <http://eca.knmi.nl/>
_ -
Global Climate Data Since 1929 <http://en.tutiempo.net/climate>
_ -
NASA Global Imagery Browse Services <https://wiki.earthdata.nasa.gov/display/GIBS>
_ -
NOAA Bering Sea Climate <http://www.beringclimate.noaa.gov/>
_ -
NOAA Climate Datasets <http://www.ncdc.noaa.gov/data-access/quick-links>
_ -
NOAA Realtime Weather Models <http://www.ncdc.noaa.gov/data-access/model-data/model-datasets/numerical-weather-prediction>
_ -
The World Bank Open Data Resources for Climate Change <http://data.worldbank.org/developers/climate-data-api>
_ -
UEA Climatic Research Unit <http://www.cru.uea.ac.uk/data>
_ -
WorldClim - Global Climate Data <http://www.worldclim.org>
_ -
WU Historical Weather Worldwide <https://www.wunderground.com/history/index.html>
_
Complex Networks
-
AMiner Citation Network Dataset <http://aminer.org/citation>
_ -
CrossRef DOI URLs <https://archive.org/details/doi-urls>
_ -
DBLP Citation dataset <https://kdl.cs.umass.edu/display/public/DBLP>
_ -
NBER Patent Citations <http://nber.org/patents/>
_ -
Network Repository with Interactive Exploratory Analysis Tools <http://networkrepository.com/>
_ -
NIST complex networks data collection <http://math.nist.gov/~RPozo/complex_datasets.html>
_ -
Protein-protein interaction network <http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm>
_ -
PyPI and Maven Dependency Network <https://ogirardot.wordpress.com/2013/01/31/sharing-pypimaven-dependency-data/>
_ -
Scopus Citation Database <https://www.elsevier.com/solutions/scopus>
_ -
Small Network Data <http://www-personal.umich.edu/~mejn/netdata/>
_ -
Stanford GraphBase (Steven Skiena) <http://www3.cs.stonybrook.edu/~algorith/implement/graphbase/implement.shtml>
_ -
Stanford Large Network Dataset Collection <http://snap.stanford.edu/data/>
_ -
Stanford Longitudinal Network Data Sources <http://stanford.edu/group/sonia/dataSources/index.html>
_ -
The Koblenz Network Collection <http://konect.uni-koblenz.de/>
_ -
The Laboratory for Web Algorithmics (UNIMI) <http://law.di.unimi.it/datasets.php>
_ -
The Nexus Network Repository <http://nexus.igraph.org/>
_ -
UCI Network Data Repository <https://networkdata.ics.uci.edu/resources.php>
_ -
UFL sparse matrix collection <http://www.cise.ufl.edu/research/sparse/matrices/>
_ -
WSU Graph Database <http://www.eecs.wsu.edu/mgd/gdb.html>
_ -
DIMACS Road Networks Collection <http://www.dis.uniroma1.it/challenge9/download.shtml>
_
Computer Networks
-
3.5B Web Pages from CommonCraw 2012 <http://www.bigdatanews.com/profiles/blogs/big-data-set-3-5-billion-web-pages-made-available-for-all-of-us>
_ -
53.5B Web clicks of 100K users in Indiana Univ. <http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/>
_ -
CAIDA Internet Datasets <http://www.caida.org/data/overview/>
_ -
ClueWeb09 - 1B web pages <http://lemurproject.org/clueweb09/>
_ -
ClueWeb12 - 733M web pages <http://lemurproject.org/clueweb12/>
_ -
CommonCrawl Web Data over 7 years <http://commoncrawl.org/the-data/get-started/>
_ -
CRAWDAD Wireless datasets from Dartmouth Univ. <https://crawdad.cs.dartmouth.edu/>
_ -
Criteo click-through data <http://labs.criteo.com/2015/03/criteo-releases-its-new-dataset/>
_ -
Open Mobile Data by MobiPerf <https://console.developers.google.com/storage/openmobiledata_public/>
_ -
Rapid7 Sonar Internet Scans <https://sonar.labs.rapid7.com/>
_ -
UCSD Network Telescope, IPv4 /8 net <http://www.caida.org/projects/network_telescope/>
_
Contextual Data
-
Context-aware data sets from five domains <http://students.depaul.edu/~yzheng8/DataSets.html#Data>
_ orGitHub <https://github.com/irecsys/CARSKit/tree/master/context-aware_data_sets>
_
Data Challenges
-
Challenges in Machine Learning <http://www.chalearn.org/>
_ -
CrowdANALYTIX dataX <http://data.crowdanalytix.com>
_ -
D4D Challenge of Orange <http://www.d4d.orange.com/en/home>
_ -
DrivenData Competitions for Social Good <http://www.drivendata.org/>
_ -
ICWSM Data Challenge (since 2009) <http://icwsm.cs.umbc.edu/>
_ -
Kaggle Competition Data <https://www.kaggle.com/>
_ -
KDD Cup by Tencent 2012 <http://www.kddcup2012.org/>
_ -
Localytics Data Visualization Challenge <https://github.com/localytics/data-viz-challenge>
_ -
Netflix Prize <http://netflixprize.com/leaderboard.html>
_ -
Space Apps Challenge <https://2015.spaceappschallenge.org>
_ -
Telecom Italia Big Data Challenge <https://dandelion.eu/datamine/open-big-data/>
_ -
Yelp Dataset Challenge <http://www.yelp.com/dataset_challenge>
_ -
Bruteforce Database <https://github.com/duyetdev/bruteforce-database>
_
Earth Science
-
AQUASTAT - Global water resources and uses <http://www.fao.org/nr/water/aquastat/data/query/index.html?lang=en>
_ -
BODC - marine data of ~22K vars <http://www.bodc.ac.uk/data/where_to_find_data/>
_ -
Earth Models <http://www.earthmodels.org/>
_ -
EOSDIS - NASA's earth observing system data <http://sedac.ciesin.columbia.edu/data/sets/browse>
_ -
Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements <https://imos.aodn.org.au>
_ oron S3 <http://imos-data.s3-website-ap-southeast-2.amazonaws.com/>
_ -
Marinexplore - Open Oceanographic Data <http://marinexplore.org/>
_ -
Smithsonian Institution Global Volcano and Eruption Database <http://volcano.si.edu/>
_ -
USGS Earthquake Archives <http://earthquake.usgs.gov/earthquakes/search/>
_
Economics
-
American Economic Association (AEA) <https://www.aeaweb.org/resources/data>
_ -
EconData from UMD <http://inforumweb.umd.edu/econdata/econdata.html>
_ -
Economic Freedom of the World Data <http://www.freetheworld.com/datasets_efw.html>
_ -
Historical MacroEconomc Statistics <http://www.historicalstatistics.org/>
_ -
International Economics Database <http://widukind.cepremap.org/>
_ andvarious data tools <https://github.com/Widukind>
_ -
International Trade Statistics <http://www.econostatistics.co.za/>
_ -
Internet Product Code Database <http://www.upcdatabase.com/>
_ -
Joint External Debt Data Hub <http://www.jedh.org/>
_ -
Jon Haveman International Trade Data Links <http://www.macalester.edu/research/economics/PAGE/HAVEMAN/Trade.Resources/TradeData.html>
_ -
OpenCorporates Database of Companies in the World <https://opencorporates.com/>
_ -
Our World in Data <http://ourworldindata.org/>
_ -
SciencesPo World Trade Gravity Datasets <http://econ.sciences-po.fr/thierry-mayer/data>
_ -
The Atlas of Economic Complexity <http://atlas.cid.harvard.edu>
_ -
The Center for International Data <http://cid.econ.ucdavis.edu>
_ -
The Observatory of Economic Complexity <http://atlas.media.mit.edu/en/>
_ -
UN Commodity Trade Statistics <http://comtrade.un.org/db/>
_ -
UN Human Development Reports <http://hdr.undp.org/en>
_
Education
-
Student Data from Free Code Camp <http://academictorrents.com/details/030b10dad0846b5aecc3905692890fb02404adbf>
_
Energy
-
AMPds <http://ampds.org/>
_ -
BLUEd <http://nilm.cmubi.org/>
_ -
COMBED <http://combed.github.io/>
_ -
Dataport <https://dataport.pecanstreet.org/>
_ -
DRED <http://www.st.ewi.tudelft.nl/~akshay/dred/>
_ -
ECO <http://www.vs.inf.ethz.ch/res/show.html?what=eco-data>
_ -
EIA <http://www.eia.gov/electricity/data/eia923/>
_ -
HES <http://randd.defra.gov.uk/Default.aspx?Menu=Menu&Module=More&Location=None&ProjectID=17359&FromSearch=Y&Publisher=1&SearchText=EV0702&SortString=ProjectCode&SortOrder=Asc&Paging=10#Description>
_ - Household Electricity Study, UK -
HFED <http://hfed.github.io/>
_ -
iAWE <http://iawe.github.io/>
_ -
PLAID <http://plaidplug.com/>
_ - the Plug Load Appliance Identification Dataset -
REDD <http://redd.csail.mit.edu/>
_ -
Tracebase <https://www.tracebase.org>
_ -
UK-DALE <http://www.doc.ic.ac.uk/~dk3810/data/>
_ - UK Domestic Appliance-Level Electricity -
WHITED <http://nilmworkshop.org/2016/proceedings/Poster_ID18.pdf>
_
Finance
-
CBOE Futures Exchange <http://cfe.cboe.com/Data/>
_ -
Google Finance <https://www.google.com/finance>
_ -
Google Trends <http://www.google.com/trends?q=google&ctab=0&geo=all&date=all&sort=0>
_ -
NASDAQ <https://data.nasdaq.com/>
_ -
OANDA <http://www.oanda.com/>
_ -
OSU Financial data <http://fisher.osu.edu/fin/fdf/osudata.htm>
_ -
Quandl <https://www.quandl.com/>
_ -
St Louis Federal <https://research.stlouisfed.org/fred2/>
_ -
Yahoo Finance <http://finance.yahoo.com/>
_ -
NYSE Market Data <ftp://ftp.nyxdata.com>
_ (see FTP link onRAW <https://raw.githubusercontent.com/caesar0301/awesome-public-datasets/master/README.rst>
_)
GIS
-
Planet.Parts: List of Near-Realtime Earth Observation Data Sources <https://planet.parts/>
_ -
Global Landcover Data Time Series (1992-2015) <http://maps.elie.ucl.ac.be/CCI/viewer/>
_ -
Worldwide data discovery portal <http://opendatadiscovery.org/>
_ -
Cambridge, MA, US, GIS data on GitHub <http://cambridgegis.github.io/gisdata.html>
_ -
Factual Global Location Data <https://www.factual.com/>
_ -
Geo Spatial Data from ASU <http://geodacenter.asu.edu/datalist/>
_ -
Geo Wiki Project - Citizen-driven Environmental Monitoring <http://geo-wiki.org/>
_ -
GeoFabrik - OSM data extracted to a variety of formats and areas <http://download.geofabrik.de/>
_ -
GeoNames Worldwide <http://www.geonames.org/>
_ -
Global Administrative Areas Database (GADM) <http://www.gadm.org/>
_ -
Homeland Infrastructure Foundation-Level Data <https://hifld-dhs-gii.opendata.arcgis.com/>
_ -
HydroSHEDS: Global hydrographic data at 90m resolution <http://www.hydrosheds.org/>
_ -
Landsat 8 on AWS <https://aws.amazon.com/public-data-sets/landsat/>
_ -
Data browser for Landsat 8 and Sentinel 2 <https://remotepixel.ca/projects/satellitesearch.html>
_ -
List of all countries in all languages <https://github.com/umpirsky/country-list>
_ -
National Weather Service GIS Data Portal <http://www.nws.noaa.gov/gis/>
_ -
Natural Earth - vectors and rasters of the world, including elevation <http://www.naturalearthdata.com/>
_ -
Additional sources of elevation data <http://vterrain.org/Elevation/global.html>
_ -
OpenAddresses <http://openaddresses.io/>
_ -
OpenStreetMap (OSM) <http://wiki.openstreetmap.org/wiki/Downloading_data>
_ -
Pleiades - Gazetteer and graph of ancient places <http://pleiades.stoa.org/>
_ -
Reverse Geocoder using OSM data <https://github.com/kno10/reversegeocode>
_ &additional high-resolution data files <http://data.ub.uni-muenchen.de/61/>
_ -
TwoFishes - Foursquare's coarse geocoder <https://github.com/foursquare/twofishes>
_ -
TZ Timezones shapfiles <http://efele.net/maps/tz/world/>
_ -
UN Environmental Data <http://geodata.grid.unep.ch/>
_ -
World boundaries from the U.S. Department of State <https://hiu.state.gov/data/data.aspx>
_ -
World countries in multiple formats <https://github.com/mledoze/countries>
_
GIS - Regional
GIS - United States """"""""""""""""""""
-
US Hydrography (Rivers, Lakes, etc) - NHDPlus <http://www.horizon-systems.com/nhdplus/>
_ -
TIGER/Line - U.S. boundaries and roads <http://www.census.gov/geo/maps-data/data/tiger-line.html>
_ -
National Land Cover Dataset <http://www.mrlc.gov/finddata.php>
_ -
Protected Areas Dataset (PAD-US) <http://gapanalysis.usgs.gov/padus/>
_ -
Estimated Private Domestic Wells <https://geodata.epa.gov/arcgis/rest/services/ORD/Estimated_Private_Domestic_Wells/MapServer>
_ -
USGS Water Flow Data <http://waterdata.usgs.gov/nwis>
_ -
USGS National Map Viewer (Find other data) <https://viewer.nationalmap.gov/viewer/>
_ -
American Factfinder: Census data (demographics, economic, etc) <http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml>
_ -
NRCS Geospatial Data Gateway (climate, geology, soils, imagery, and more) <https://gdg.sc.egov.usda.gov/GDGOrder.aspx>
_ -
eDNA Species Occurence Database <https://www.fs.fed.us/rm/boise/AWAE/projects/the-aquatic-eDNAtlas-project.html>
_
GIS - California
"""""""""""""""""""
Large amounts of this section originally from this gist <https://gist.github.com/nickrsan/958cd0471c4612ec6fba86ae7aeb3c7a>
_.
Portals with many data themes ++++++++++++++++++++++++++++++
-
California Open Data Portal <https://data.ca.gov/>
_ -
Natural Resources Agency Data Portal <https://data.cnra.ca.gov/>
_ -
Department of Conservation Map Viewer <https://maps.conservation.ca.gov/>
_
Agriculture +++++++++++
-
DWR Crop Land Use Maps <https://data.cnra.ca.gov/dataset/statewide-crop-mapping>
_
Environment, Ecology, and Human Interactions ++++++++++++++++++++++++++++++++++++++++++++
-
CalEnviroScreen <https://oehha.ca.gov/calenviroscreen/report/calenviroscreen-30>
_
Energy +++++++
-
CEC all power plants <http://energyalmanac.ca.gov/electricity/>
_ -
CEC Hydropower <http://www.energyalmanac.ca.gov/renewables/hydro/>
_ -
Historical (2000-) wholesale electricity prices <http://www.eia.gov/electricity/wholesale/>
_ -
Generation sources <https://www.eia.gov/electricity/data/state/>
_
High-Res Elevation and Bathymetry ++++++++++++++++++++++++++++++++++
-
Seafloor mapping lab: bathymetry portal <http://seafloor.csumb.edu/csmp/csmp_datacatalog.html>
_
Hydrology ++++++++++
-
CDEC <http://cdec.water.ca.gov/>
_: historical and real-time reservoir storage, inflows, outflows, stream gauges, snowpack -
USGS daily streamflow <http://waterdata.usgs.gov/ca/nwis/current/?type=dailydischarge&group_key=county_cd>
_ -
USGS all sites <http://maps.waterdata.usgs.gov/mapper/?state=ca>
_ -
Gage Gap: An analysis of California's stream gage network <https://gagegap.codefornature.org/>
_ -
NRCS snotel site data <http://www.wcc.nrcs.usda.gov/snow/snotel-wedata.html>
_ -
NOAA Historical precip/temp <http://www.ncdc.noaa.gov/cdo-web/>
_: at individual stations -
DWR Full Natural Flows 1922-2003 <http://www.waterboards.ca.gov/waterrights/water_issues/programs/bay_delta/bay_delta_plan/water_quality_control_planning/docs/sjrf_spprtinfo/dwr_2007a.pdf>
_ -
NOAA/NWS RFC Archive <http://www.cnrfc.noaa.gov/arc_search.php>
_ -
Paleo Reconstruction <http://treeflow.info/california>
_ -
Dayflow (Delta outflows) <http://www.water.ca.gov/dayflow/output/Output.cfm>
_ -
MOPEX <http://tdwg.catchment.org/datasets.html>
_ -
Groundwater <http://www.water.ca.gov/waterdatalibrary/groundwater/>
_ -
NOAA ClimDiv <http://www1.ncdc.noaa.gov/pub/data/cirs/climdiv/>
_: historical drought indices -
DWR precip indices <http://cdec.water.ca.gov/snow_rain.html>
_ 8-station index, etc -
CA Dams <http://www.water.ca.gov/damsafety/damlisting/index.cfm>
_ -
Sierra Nevada Meadows Dataset <https://meadows.ucdavis.edu>
_
Consumption ++++++++++++
-
USGS water use, 5-year timestep <http://waterdata.usgs.gov/ca/nwis/water_use/>
_ -
DWR land use <http://www.water.ca.gov/landwateruse/anlwuest.cfm>
_: Ag demand and land use estimates by county/DAU from 1998-2010 (annual) -
DWR UWMP data <http://www.water.ca.gov/urbanwatermanagement/2010_Urban_Water_Management_Plan_Data.cfm>
: Urban water management plan (2015 update <https://wuedata.water.ca.gov/>
) -
SWRCB conservation reports <http://www.waterboards.ca.gov/water_issues/programs/conservation_portal/conservation_reporting.shtml>
_ -
California Water Network <http://hobbes.ucdavis.edu/node>
_ -
SWRCB eWRIMS <https://ciwqs.waterboards.ca.gov/ciwqs/ewrims/EWMenuPublic.jsp>
_ -
SWRCB WRUDS <http://www.waterboards.ca.gov/waterrights/water_issues/programs/drought/analysis/>
_: "Average Demand"
Boundaries +++++++++++
-
DWR Planning areas, DAUs, watersheds, etc <http://www.waterplan.water.ca.gov/maps/>
_ -
CEHTP Public Water System boundaries <http://www.ehib.org/page.jsp?page_key=762>
_
Water Rights +++++++++++++
-
New CA Water Atlas <https://github.com/NewCaliforniaWaterAtlas/data-water-rights>
_
USBR CVP Mid-Pacific ++++++++++++++++++++++
-
Operations, contractors, deliveries <http://www.usbr.gov/mp/PA/water/>
_ -
Monthly reports <http://www.usbr.gov/mp/cvo/Mo_Rpts_Prev.html>
_ -
2008 report on CVP/SWP water yield <http://www.usbr.gov/mp/cvp/docs/Water%20Supply%20and%20Yield%20Study.pdf>
_ -
Irrigation district WMPs <http://www.usbr.gov/mp/watershare/wcplans/>
_
SWP Operations +++++++++++++++
-
Project-wide <http://water.ca.gov/swp/operationscontrol/projectwide.cfm>
_ -
Monthly operations <http://www.water.ca.gov/swp/operationscontrol/monthly.cfm>
_ -
Contractors list <http://www.swc.org/about-us/member-agencies-list>
_ -
Reliability report (CALSIM II) <http://baydeltaoffice.water.ca.gov/swpreliability/>
_ -
SWP analysis office <http://www.water.ca.gov/swpao/>
_: contractors, maximum allocations -
2013 Water Plan Update <http://www.waterplan.water.ca.gov/cwpu2013/final/index.cfm>
_ -
Water plan technical details <http://www.waterplan.water.ca.gov/technical/cwpu2013/index.cfm>
_
Agriculture ++++++++++++
-
County commissioners' reports <http://www.nass.usda.gov/Statistics_by_State/California/Publications/AgComm/Detail/index.asp>
_ -
Land cover by crop type <http://nassgeodata.gmu.edu/CropScape/>
_ -
Crop prices <http://faostat3.fao.org/download/P/PA/E>
_ -
Search by crop type <http://quickstats.nass.usda.gov/#7C3B21B8-CCBE-3D0B-BBD5-2F057870134F>
_ -
Irrigation ET <http://www.itrc.org/etdata/waterbal.htm>
_ -
Farm production expenses <http://www.ers.usda.gov/data-products/farm-income-and-wealth-statistics/production-expenses.aspx#P06e29ac16a7244cfa6bd81eb5a712540_2_151iT0R0x5>
_ -
DWR Land Use (Crop) data <http://www.water.ca.gov/landwateruse/lusrvymain.cfm>
_
Species +++++++++
-
DFW Abundance Surveys <http://www.dfg.ca.gov/delta/data/>
_ -
PISCES fish database <https://pisces.ucdavis.edu>
_ -
Calfiornia Freshwater Species Database (Aquarius) <https://www.scienceforconservation.org/products/california-freshwater-species-database>
_
GIS - Australia """"""""""""""""""""
-
CSIRO Data Portal <https://data.csiro.au>
_
Government
-
OpenDataSoft's list of 1,600 open data portals <https://www.opendatasoft.com/a-comprehensive-list-of-all-open-data-portals-around-the-world/>
_ -
A list of cities and countries contributed by community <https://github.com/caesar0301/awesome-public-datasets/blob/master/Government.rst>
_
Healthcare
-
EHDP Large Health Data Sets <http://www.ehdp.com/vitalnet/datasets.htm>
_ -
Gapminder World demographic databases <http://www.gapminder.org/data/>
_ -
Medicare Coverage Database (MCD), U.S. <https://www.cms.gov/medicare-coverage-database/>
_ -
Medicare Data Engine of medicare.gov Data <https://data.medicare.gov/>
_ -
Medicare Data File <http://go.cms.gov/19xxPN4>
_ -
MeSH, the vocabulary thesaurus used for indexing articles for PubMed <https://www.nlm.nih.gov/mesh/filelist.html>
_ -
Number of Ebola Cases and Deaths in Affected Countries (2014) <https://data.hdx.rwlabs.org/dataset/ebola-cases-2014>
_ -
Open-ODS (structure of the UK NHS) <http://www.openods.co.uk>
_ -
OpenPaymentsData, Healthcare financial relationship data <https://openpaymentsdata.cms.gov>
_ -
The Cancer Genome Atlas project (TCGA) <https://tcga-data.nci.nih.gov/tcga/tcgaDownload.jsp>
_ andBigQuery table <http://google-genomics.readthedocs.org/en/latest/use_cases/discover_public_data/isb_cgc_data.html>
_ -
World Health Organization Global Health Observatory <http://www.who.int/gho/en/>
_
Image Processing
-
10k US Adult Faces Database <http://wilmabainbridge.com/facememorability2.html>
_ -
2GB of Photos of Cats <http://137.189.35.203/WebUI/CatDatabase/catData.html>
_ orArchive version <https://web.archive.org/web/20150520175645/http://137.189.35.203/WebUI/CatDatabase/catData.html>
_ -
Affective Image Classification <http://www.imageemotion.org/>
_ -
Animals with attributes <http://attributes.kyb.tuebingen.mpg.de/>
_ -
Face Recognition Benchmark <http://www.face-rec.org/databases/>
_ -
ImageNet (in WordNet hierarchy) <http://www.image-net.org/>
_ -
Indoor Scene Recognition <http://web.mit.edu/torralba/www/indoor.html>
_ -
International Affective Picture System, UFL <http://csea.phhp.ufl.edu/media/iapsmessage.html>
_ -
Massive Visual Memory Stimuli, MIT <http://cvcl.mit.edu/MM/stimuli.html>
_ -
Several Shape-from-Silhouette Datasets <http://kaiwolf.no-ip.org/3d-model-repository.html>
_ -
Stanford Dogs Dataset <http://vision.stanford.edu/aditya86/ImageNetDogs/>
_ -
SUN database, MIT <http://groups.csail.mit.edu/vision/SUN/hierarchy.html>
_ -
The Oxford-IIIT Pet Dataset <http://www.robots.ox.ac.uk/~vgg/data/pets/>
_ -
YouTube Faces Database <http://www.cs.tau.ac.il/~wolf/ytfaces/>
_ -
Adience Unfiltered faces for gender and age classification <http://www.openu.ac.il/home/hassner/Adience/data.html>
_ -
The Action Similarity Labeling (ASLAN) Challenge <http://www.openu.ac.il/home/hassner/data/ASLAN/ASLAN.html>
_ -
Violent-Flows - Crowd Violence \ Non-violence Database and benchmark <http://www.openu.ac.il/home/hassner/data/violentflows/>
_
Machine Learning
-
Delve Datasets for classification and regression (Univ. of Toronto) <http://www.cs.toronto.edu/~delve/data/datasets.html>
_ -
Discogs Monthly Data <http://data.discogs.com/>
_ -
eBay Online Auctions (2012) <http://www.modelingonlineauctions.com/datasets>
_ -
IMDb Database <http://www.imdb.com/interfaces>
_ -
Keel Repository for classification, regression and time series <http://sci2s.ugr.es/keel/datasets.php>
_ -
Labeled Faces in the Wild (LFW) <http://vis-www.cs.umass.edu/lfw/>
_ -
Lending Club Loan Data <https://www.lendingclub.com/info/download-data.action>
_ -
Machine Learning Data Set Repository <http://mldata.org/>
_ -
Million Song Dataset <http://labrosa.ee.columbia.edu/millionsong/>
_ -
More Song Datasets <http://labrosa.ee.columbia.edu/millionsong/pages/additional-datasets>
_ -
New Yorker caption contest ratings <https://github.com/nextml/caption-contest-data>
_ -
MovieLens Data Sets <http://grouplens.org/datasets/movielens/>
_ -
RDataMining - "R and Data Mining" ebook data <http://www.rdatamining.com/data>
_ -
Registered Meteorites on Earth <http://healthintelligence.drupalgardens.com/content/registered-meteorites-has-impacted-earth-visualized>
_ -
Restaurants Health Score Data in San Francisco <http://missionlocal.org/san-francisco-restaurant-health-inspections/>
_ -
UCI Machine Learning Repository <http://archive.ics.uci.edu/ml/>
_ -
Yahoo! Ratings and Classification Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=r>
_
Museums
-
Canada Science and Technology Museums Corporation's Open Data <http://techno-science.ca/en/data.php>
_ -
Cooper-Hewitt's Collection Database <https://github.com/cooperhewitt/collection>
_ -
Minneapolis Institute of Arts metadata <https://github.com/artsmia/collection>
_ -
Natural History Museum (London) Data Portal <http://data.nhm.ac.uk/>
_ -
Rijksmuseum Historical Art Collection <https://www.rijksmuseum.nl/en/api>
_ -
Tate Collection metadata <https://github.com/tategallery/collection>
_ -
The Getty vocabularies <http://vocab.getty.edu>
_
Natural Language
-
Blogger Corpus <http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm>
_ -
CLiPS Stylometry Investigation Corpus <http://www.clips.uantwerpen.be/datasets/csi-corpus>
_ -
ClueWeb09 FACC <http://lemurproject.org/clueweb09/FACC1/>
_ -
ClueWeb12 FACC <http://lemurproject.org/clueweb12/FACC1/>
_ -
DBpedia - 4.58M things with 583M facts <http://wiki.dbpedia.org/Datasets>
_ -
Flickr Personal Taxonomies <http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html>
_ -
Freebase.com of people, places, and things <http://www.freebase.com/>
_ -
Google Books Ngrams (2.2TB) <https://aws.amazon.com/datasets/google-books-ngrams/>
_ -
Google Web 5gram (1TB, 2006) <https://catalog.ldc.upenn.edu/LDC2006T13>
_ -
Gutenberg eBooks List <http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs>
_ -
Hansards text chunks of Canadian Parliament <http://www.isi.edu/natural-language/download/hansard/>
_ -
Machine Comprehension Test (MCTest) of text from Microsoft Research <http://research.microsoft.com/en-us/um/redmond/projects/mctest/index.html>
_ -
Machine Translation of European languages <http://statmt.org/wmt11/translation-task.html#download>
_ -
Personae Corpus <http://www.clips.uantwerpen.be/datasets/personae-corpus>
_ -
SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) <https://github.com/ParallelMazen/SaudiNewsNet>
_ -
SMS Spam Collection in English <http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/>
_ -
USENET postings corpus of 2005~2011 <http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html>
_ -
Wikidata - Wikipedia databases <https://www.wikidata.org/wiki/Wikidata:Database_download>
_ -
Wikipedia Links data - 40 Million Entities in Context <https://code.google.com/p/wiki-links/downloads/list>
_ -
Universal Dependencies <http://universaldependencies.org>
_ -
WordNet databases and tools <http://wordnet.princeton.edu/wordnet/download/>
_ -
Open Multilingual Wordnet <http://compling.hss.ntu.edu.sg/omw/>
_
Neuroscience
-
Allen Institute Datasets <http://www.brain-map.org/>
_ -
Brain Catalogue <http://braincatalogue.org/>
_ -
Brainomics <http://brainomics.cea.fr/localizer>
_ -
CodeNeuro Datasets <http://datasets.codeneuro.org/>
_ -
Collaborative Research in Computational Neuroscience (CRCNS) <http://crcns.org/data-sets>
_ -
FCP-INDI <http://fcon_1000.projects.nitrc.org/index.html>
_ -
Human Connectome Project <http://www.humanconnectome.org/data/>
_ -
NDAR <https://ndar.nih.gov/>
_ -
NIMH Data Archive <http://data-archive.nimh.nih.gov/>
_ -
NeuroData <http://neurodata.io>
_ -
OASIS <http://www.oasis-brains.org/>
_ -
OpenfMRI <https://openfmri.org/>
_ -
Neuroelectro <http://neuroelectro.org/>
_ -
Study Forrest <http://studyforrest.org>
_
Physics
-
CERN Open Data Portal <http://opendata.cern.ch/>
_ -
Crystallography Open Database <http://www.crystallography.net/>
_ -
NASA Exoplanet Archive <http://exoplanetarchive.ipac.caltech.edu/>
_ -
NSSDC (NASA) data of 550 space spacecraft <http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html>
_ -
Sloan Digital Sky Survey (SDSS) - Mapping the Universe <http://www.sdss.org/>
_
Psychology/Cognition
-
OSU Cognitive Modeling Repository Datasets <http://www.cmr.osu.edu/browse/datasets>
_
Public Domains
-
Amazon <http://aws.amazon.com/datasets/>
_ -
Archive-it from Internet Archive <https://www.archive-it.org/explore?show=Collections>
_ -
Archive.org Datasets <https://archive.org/details/datasets>
_ -
CMU JASA data archive <http://lib.stat.cmu.edu/jasadata/>
_ -
CMU StatLab collections <http://lib.stat.cmu.edu/datasets/>
_ -
Data360 <http://www.data360.org/index.aspx>
_ -
Datamob.org <http://datamob.org/datasets>
_ -
Google <http://www.google.com/publicdata/directory>
_ -
Infochimps <http://www.infochimps.com/>
_ -
KDNuggets Data Collections <http://www.kdnuggets.com/datasets/index.html>
_ -
Microsoft Azure Data Market Free DataSets <http://datamarket.azure.com/browse/data?price=free>
_ -
Numbray <http://numbrary.com/>
_ -
Open Library Data Dumps <https://openlibrary.org/developers/dumps>
_ -
Reddit Datasets <https://www.reddit.com/r/datasets>
_ -
RevolutionAnalytics Collection <http://packages.revolutionanalytics.com/datasets/>
_ -
Sample R data sets <http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html>
_ -
Stats4Stem R data sets <http://www.stats4stem.org/data-sets.html>
_ -
StatSci.org <http://www.statsci.org/datasets.html>
_ -
The Washington Post List <http://www.washingtonpost.com/wp-srv/metro/data/datapost.html>
_ -
UCLA SOCR data collection <http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data>
_ -
UFO Reports <http://www.nuforc.org/webreports.html>
_ -
Wikileaks 911 pager intercepts <https://911.wikileaks.org/files/index.html>
_ -
Yahoo Webscope <http://webscope.sandbox.yahoo.com/catalog.php>
_
Search Engines
-
Academic Torrents of data sharing from UMB <http://academictorrents.com/>
_ -
Datahub.io <https://datahub.io/dataset>
_ -
DataMarket (Qlik) <https://datamarket.com/data/list/?q=all>
_ -
Harvard Dataverse Network of scientific data <https://dataverse.harvard.edu/>
_ -
ICPSR (UMICH) <http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp>
_ -
Institute of Education Sciences <http://eric.ed.gov>
_ -
National Technical Reports Library <http://www.ntis.gov/products/ntrl/>
_ -
Open Data Certificates (beta) <https://certificates.theodi.org/en/datasets>
_ -
OpenDataNetwork - A search engine of all Socrata powered data portals <http://www.opendatanetwork.com/>
_ -
Statista.com - statistics and Studies <http://www.statista.com/>
_ -
Zenodo - An open dependable home for the long-tail of science <https://zenodo.org/collection/datasets>
_
Social Networks
-
72 hours #gamergate Twitter Scrape <http://waxy.org/random/misc/gamergate_tweets.csv>
_ -
Ancestry.com Forum Dataset over 10 years <http://www.cs.cmu.edu/~jelsas/data/ancestry.com/>
_ -
Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape <https://archive.org/details/twitter_cikm_2010>
_ -
CMU Enron Email of 150 users <http://www.cs.cmu.edu/~enron/>
_ -
EDRM Enron EMail of 151 users, hosted on S3 <https://aws.amazon.com/datasets/enron-email-data/>
_ -
Facebook Data Scrape (2005) <https://archive.org/details/oxford-2005-facebook-matrix>
_ -
Facebook Social Networks from LAW (since 2007) <http://law.di.unimi.it/datasets.php>
_ -
Foursquare from UMN/Sarwat (2013) <https://archive.org/details/201309_foursquare_dataset_umn>
_ -
GetGlue - users rating TV shows <http://getglue-data.s3.amazonaws.com/getglue_sample.tar.gz>
_ -
GitHub Collaboration Archive <https://www.githubarchive.org/>
_ -
Google Scholar citation relations <http://www3.cs.stonybrook.edu/~leman/data/gscholar.db>
_ -
High-Resolution Contact Networks from Wearable Sensors <http://www.sociopatterns.org/datasets/>
_ -
Mobile Social Networks from UMASS <https://kdl.cs.umass.edu/display/public/Mobile+Social+Networks>
_ -
Network Twitter Data <http://snap.stanford.edu/data/higgs-twitter.html>
_ -
Reddit Comments <https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/>
_ -
Skytrax' Air Travel Reviews Dataset <https://github.com/quankiquanki/skytrax-reviews-dataset>
_ -
Social Twitter Data <http://snap.stanford.edu/data/egonets-Twitter.html>
_ -
SourceForge.net Research Data <http://www3.nd.edu/~oss/Data/data.html>
_ -
Twitter Data for Sentiment Analysis <http://help.sentiment140.com/for-students/>
_ -
Twitter Data for Online Reputation Management <http://nlp.uned.es/replab2013/>
_ -
Twitter Graph of entire Twitter site <http://an.kaist.ac.kr/traces/WWW2010.html>
_ -
Twitter Scrape Calufa May 2011 <http://archive.org/details/2011-05-calufa-twitter-sql>
_ -
UNIMI/LAW Social Network Datasets <http://law.di.unimi.it/datasets.php>
_ -
Yahoo! Graph and Social Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=g>
_ -
Youtube Video Social Graph in 2007,2008 <http://netsg.cs.sfu.ca/youtubedata/>
_
Social Sciences
-
ACLED (Armed Conflict Location & Event Data Project) <http://www.acleddata.com/>
_ -
Canadian Legal Information Institute <https://www.canlii.org/en/index.php>
_ -
Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc <http://www.systemicpeace.org/>
_ -
Correlates of War Project <http://www.correlatesofwar.org/>
_ -
Cryptome Conspiracy Theory Items <http://cryptome.org>
_ -
Datacards <http://datacards.org>
_ -
European Social Survey <http://www.europeansocialsurvey.org/data/>
_ -
FBI Hate Crime 2013 - aggregated data <https://github.com/emorisse/FBI-Hate-Crime-Statistics/tree/master/2013>
_ -
GDELT Global Events Database <http://gdeltproject.org/data.html>
_ -
General Social Survey (GSS) since 1972 <http://gss.norc.org>
_ -
German Social Survey <http://www.gesis.org/en/home/>
_ -
Global Religious Futures Project <http://www.globalreligiousfutures.org/>
_ -
Humanitarian Data Exchange <https://data.hdx.rwlabs.org/>
_ -
Institute for Demographic Studies <http://www.ined.fr/en/>
_ -
International Networks Archive <http://www.princeton.edu/~ina/>
_ -
International Social Survey Program ISSP <http://www.issp.org>
_ -
International Studies Compendium Project <http://www.isacompendium.com/public/>
_ -
James McGuire Cross National Data <http://jmcguire.faculty.wesleyan.edu/welcome/cross-national-data/>
_ -
MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste <http://nsd.uib.no>
_ -
Minnesota Population Center <https://www.ipums.org/>
_ -
MIT Reality Mining Dataset <http://realitycommons.media.mit.edu/realitymining.html>
_ -
Open Crime and Policing Data in England, Wales and Northern Ireland <https://data.police.uk/data/>
_ -
Paul Hensel General International Data Page <http://www.paulhensel.org/dataintl.html>
_ -
PewResearch Internet Survey Project <http://www.pewinternet.org/datasets/pages/2/>
_ -
PewResearch Society Data Collection <http://www.pewresearch.org/data/download-datasets/>
_ -
Political Polarity Data <http://www3.cs.stonybrook.edu/~leman/data/14-icwsm-political-polarity-data.zip>
_ -
StackExchange Data Explorer <http://data.stackexchange.com/help>
_ -
Terrorism Research and Analysis Consortium <http://www.trackingterrorism.org/>
_ -
Texas Inmates Executed Since 1984 <http://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html>
_ -
Titanic Survival Data Set <https://github.com/caesar0301/awesome-public-datasets/tree/master/Datasets>
_ -
UCB's Archive of Social Science Data (D-Lab) <http://ucdata.berkeley.edu/>
_ -
Uppsala Conflict Data Program <http://ucdp.uu.se/>
_ -
UCLA Social Sciences Data Archive <http://dataarchives.ss.ucla.edu/Home.DataPortals.htm>
_ -
UN Civil Society Database <http://esango.un.org/civilsociety/>
_ -
Universities Worldwide <http://univ.cc/>
_ -
UPJOHN for Labor Employment Research <http://www.upjohn.org/services/resources/employment-research-data-center>
_ -
World Bank Data <http://data.worldbank.org/>
_ -
WorldPop project - Worldwide human population distributions <http://www.worldpop.org.uk/data/get_data/>
_
Software
-
FLOSSmole data about free, libre, and open source software development <http://flossdata.syr.edu/data/>
_
Sports
-
Basketball (NBA/NCAA/Euro) Player Database and Statistics <http://www.draftexpress.com/stats.php>
_ -
Betfair Historical Exchange Data <http://data.betfair.com/>
_ -
Cricsheet Matches (cricket) <http://cricsheet.org/>
_ -
Ergast Formula 1, from 1950 up to date (API) <http://ergast.com/mrd/db>
_ -
Football/Soccer resources (data and APIs) <http://www.jokecamp.com/blog/guide-to-football-and-soccer-data-and-apis/>
_ -
Lahman's Baseball Database <http://www.seanlahman.com/baseball-archive/statistics/>
_ -
Pinhooker: Thoroughbred Bloodstock Sale Data <https://github.com/phillc73/pinhooker>
_ -
Retrosheet Baseball Statistics <http://www.retrosheet.org/game.htm>
_
Time Series
-
Databanks International Cross National Time Series Data Archive <http://www.cntsdata.com>
_ -
Hard Drive Failure Rates <https://www.backblaze.com/hard-drive-test-data.html>
_ -
Heart Rate Time Series from MIT <http://ecg.mit.edu/time-series/>
_ -
Time Series Data Library (TSDL) from MU <https://datamarket.com/data/list/?q=provider:tsdl>
_ -
UC Riverside Time Series Dataset <http://www.cs.ucr.edu/~eamonn/time_series_data/>
_
Transportation
-
Airlines OD Data 1987-2008 <http://stat-computing.org/dataexpo/2009/the-data.html>
_ -
Bay Area Bike Share Data <http://www.bayareabikeshare.com/open-data>
_ -
Bike Share Systems (BSS) collection <https://github.com/BetaNYC/Bike-Share-Data-Best-Practices/wiki/Bike-Share-Data-Systems>
_ -
GeoLife GPS Trajectory from Microsoft Research <http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/>
_ -
German train system by Deutsche Bahn <http://data.deutschebahn.com/datasets/>
_ -
Hubway Million Rides in MA <http://hubwaydatachallenge.org/trip-history-data/>
_ -
Marine Traffic - ship tracks, port calls and more <http://www.marinetraffic.com/de/ais-api-services>
_ -
Montreal BIXI Bike Share <https://montreal.bixi.com/donn%C3%A9es-libre-service>
_ -
NYC Taxi Trip Data 2009- <http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml>
_ -
NYC Taxi Trip Data 2013 (FOIA/FOILed) <https://archive.org/details/nycTaxiTripData2013>
_ -
NYC Uber trip data April 2014 to September 2014 <https://github.com/fivethirtyeight/uber-tlc-foil-response>
_ -
Open Traffic collection <https://github.com/graphhopper/open-traffic-collection>
_ -
OpenFlights - airport, airline and route data <http://openflights.org/data.html>
_ -
Philadelphia Bike Share Stations (JSON) <https://www.rideindego.com/stations/json/>
_ -
Plane Crash Database, since 1920 <http://www.planecrashinfo.com/database.htm>
_ -
RITA Airline On-Time Performance data <http://www.transtats.bts.gov/Tables.asp?DB_ID=120>
_ -
RITA/BTS transport data collection (TranStat) <http://www.transtats.bts.gov/DataIndex.asp>
_ -
Toronto Bike Share Stations (XML file) <http://www.bikesharetoronto.com/data/stations/bikeStations.xml>
_ -
Transport for London (TFL) <https://tfl.gov.uk/info-for/open-data-users/data-feeds>
_ -
Travel Tracker Survey (TTS) for Chicago <http://www.cmap.illinois.gov/data/transportation/travel-tracker-survey>
_ -
U.S. Bureau of Transportation Statistics (BTS) <http://www.rita.dot.gov/bts/>
_ -
U.S. Domestic Flights 1990 to 2009 <http://academictorrents.com/details/a2ccf94bbb4af222bf8e69dad60a68a29f310d9a>
_ -
U.S. Freight Analysis Framework since 2007 <http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm>
_
Complementary Collections
-
Data Packaged Core Datasets <https://github.com/datasets/>
_ -
Database of Scientific Code Contributions <https://mozillascience.org/collaborate>
_ - DataWrangling:
Some Datasets Available on the Web <http://www.datawrangling.com/some-datasets-available-on-the-web>
_ - Inside-r:
Finding Data on the Internet <http://www.inside-r.org/howto/finding-data-internet>
_ - OpenDataMonitor:
An overview of available open data resources in Europe <http://opendatamonitor.eu>
_ - Quora:
Where can I find large datasets open to the public? <http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public>
_ - RS.io:
100+ Interesting Data Sets for Statistics <http://rs.io/100-interesting-data-sets-for-statistics/>
_ - StaTrek:
Leveraging open data to understand urban lives <http://xiaming.me/posts/2014/10/23/leveraging-open-data-to-understand-urban-lives/>
_