awesome-public-datasets icon indicating copy to clipboard operation
awesome-public-datasets copied to clipboard

Requests for new public datasets contributions. (PR preferred)

Open caesar0301 opened this issue 9 years ago • 39 comments

caesar0301 avatar Nov 30 '14 08:11 caesar0301

The Personal Genome Project (http://www.personalgenomes.org/ and https://my.pgp-hms.org/public_genetic_data) 1000 Genomes (http://www.1000genomes.org/ and http://www.1000genomes.org/data) UCSC Public Data (http://hgdownload.soe.ucsc.edu/downloads.html)

abetusk avatar Dec 03 '14 02:12 abetusk

Added! A Pull Request is encouraged to record your kindly contribution. :+1:

caesar0301 avatar Dec 03 '14 03:12 caesar0301

Are you interested in add some links from Argentina's government?

surrealcristian avatar Dec 12 '14 04:12 surrealcristian

Can you make a Pull Request about your data?

caesar0301 avatar Dec 12 '14 15:12 caesar0301

Of course. Today at night i'll send it.

2014-12-12 12:41 GMT-03:00 Xiaming [email protected]:

Can you make a Pull Request about your data?

— Reply to this email directly or view it on GitHub https://github.com/caesar0301/awesome-public-datasets/issues/1#issuecomment-66788902 .

surrealcristian avatar Dec 12 '14 15:12 surrealcristian

What about soccer? There are lot of sources for it.

zippeurfou avatar Dec 23 '14 14:12 zippeurfou

Nice source. Added under Sport. :+1:

caesar0301 avatar Dec 24 '14 01:12 caesar0301

Pandas Remote Data DataFrame API wrappers: http://pandas.pydata.org/pandas-docs/dev/remote_data.html

  • Yahoo! Finance
  • Google Finance
  • St. Louis FED (FRED)
  • Kenneth French’s data library
  • World Bank

westurner avatar Dec 26 '14 05:12 westurner

Transcriptions of all debates in the German government as txt files: http://www.bundestag.de/plenarprotokolle

dettmering avatar Mar 14 '15 11:03 dettmering

U.S. Department of Education:

  • IPEDS (higher and postsecondary education): http://nces.ed.gov/ipeds/
  • CCD (common core of data - primary/secondary): http://nces.ed.gov/ccd/
  • NAEP (nations report card - secondary): http://nces.ed.gov/nationsreportcard/
  • and a wide range of studies (using samples and weighting) on topics related to education at all levels, as well as the administration and staffing of educational institutions http://nces.ed.gov/surveys/ - all are very rigorous in their design. All have a public data set available to anyone, and some have restricted-use data sets for researchers (that are a pain in the butt to get access to).

rtbarber avatar Apr 09 '15 21:04 rtbarber

Hi, would you be able to add LG Inform to your awesume-public-datasets. It holds publically available data about local authorities and fire and rescue services in England - http://lginform.local.gov.uk/search Thanks Alex

LGInform avatar Apr 20 '15 07:04 LGInform

@rtbarber NCES added! LGInform added!

caesar0301 avatar Apr 20 '15 08:04 caesar0301

Great thanks, have a good day

Kind Regards

Alex

From: Xiaming [mailto:[email protected]] Sent: 20 April 2015 09:10 To: caesar0301/awesome-public-datasets Cc: Alexandra Marshall Subject: Re: [awesome-public-datasets] Requests for new public datasets contributions. (#1)

@rtbarberhttps://github.com/rtbarber NCES added! LGInform added!

— Reply to this email directly or view it on GitHubhttps://github.com/caesar0301/awesome-public-datasets/issues/1#issuecomment-94388374.

This email may include confidential information and is solely for use by the intended recipient(s). If you have received this email in error please notify the sender immediately. You must not disclose, copy, distribute or retain any part of the email message or attachments. No responsibility will be assumed by the LGA for any direct or consequential loss, financial or otherwise, damage or inconvenience, or any other obligation or liability incurred by readers relying on information contained in this email. Views and opinions expressed by the author are not necessarily those of the organisation nor should they be treated, where cited, as an authoritative statement of the law, and independent legal and other professional advice should be obtained as appropriate.

Visit the Local Government Association website – www.local.gov.uk

LGInform avatar Apr 20 '15 08:04 LGInform

Some additional biology-related public datasets worth considering:

ExAC - http://exac.broadinstitute.org/ (exome sequencing data for 60,706 unrelated individuals, including 1000 genomes) OMIM - http://www.omim.org/ (database of phenotype-genotype relationships) dbSNP - http://www.ncbi.nlm.nih.gov/SNP/ (database of phenotype-genotype relationships) dbGAP - http://www.ncbi.nlm.nih.gov/gap (database of phenotype-genotype relationships)

JEFworks avatar Apr 21 '15 18:04 JEFworks

A French flora recognition system : http://identify.plantnet-project.org/en/

PanArnaud avatar Apr 22 '15 14:04 PanArnaud

@PanArnaud Where is the public dataset on this page?

znurgl avatar Apr 22 '15 14:04 znurgl

It's a search engine. That may be not appropriate ... http://identify.plantnet-project.org/en/base/tree

PanArnaud avatar Apr 22 '15 15:04 PanArnaud

That's not a dataset. You can't download it as a CSV (for example) or access it via public API.

znurgl avatar Apr 22 '15 15:04 znurgl

I understand. Sorry for the inconvenience

PanArnaud avatar Apr 22 '15 15:04 PanArnaud

Obvious internet stuff: http://thecatapi.com

Xaviju avatar Oct 02 '15 10:10 Xaviju

Belgium also has open data: http://data.gov.be/

LauR3y avatar Oct 09 '15 10:10 LauR3y

The Macaulay Library: archive of wildlife sounds and videos http://macaulaylibrary.org/

cofiem avatar Oct 16 '15 12:10 cofiem

@cofiem It seems that these data are not free?

caesar0301 avatar Oct 17 '15 02:10 caesar0301

Partially free for some datasets.

caesar0301 avatar Oct 17 '15 02:10 caesar0301

@caesar0301 Unfortunately yes, you're right, the data are not free nor in a machine readable form as far as I can see :disappointed:

cofiem avatar Oct 17 '15 03:10 cofiem

I found this collection of datasets of (Context-Aware) Recommender Systems. http://students.depaul.edu/~yzheng8/DataSets.html

Maybe its a good idea to talk to the author before publish it.

gabriel-almeida avatar Oct 26 '15 12:10 gabriel-almeida

I have reached the author to grant permission. He said Yes. I will merge this cat into list manually.

caesar0301 avatar Oct 28 '15 01:10 caesar0301

Thanks for the detailed list of many awesome datasets! few missing good data source from biology side: GTEx http://www.gtexportal.org/ ESP(Exome Sequencing Project) https://esp.gs.washington.edu/drupal/ ExAC(Exome Aggregation Consortium) http://exac.broadinstitute.org/ UK10K http://www.uk10k.org/

vipints avatar Nov 18 '15 15:11 vipints

I see you have the Internet Archive's ArchiveIt! service listed as a search engine, it's really a self-serve web archiver.

Other Internet Archive datasets: https://openlibrary.org/developers/dumps -- metadata for books

wumpus avatar Jan 31 '16 00:01 wumpus

Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements: https://imos.aodn.org.au

Or directly the on the S3 bucket: http://imos-data.s3-website-ap-southeast-2.amazonaws.com/

danfruehauf avatar Jan 31 '16 23:01 danfruehauf