qiime
qiime copied to clipboard
link Bokulich et al mock community test files from the QIIME Resources page
I have all of the links in email. This is a great resource for testing new methods.
:+1: Yes please. While these are on qiita, they are not easily accessible.
We could place them in a github repo, similar to www.github.com/torognes/vsearch-data
Suggest coordinating with Nick but these are the studies in Qiita: Available: http://qiita.ucsd.edu/study/description/721 http://qiita.ucsd.edu/study/description/722 http://qiita.ucsd.edu/study/description/1686 http://qiita.ucsd.edu/study/description/1687 http://qiita.ucsd.edu/study/description/1688 http://qiita.ucsd.edu/study/description/1683 http://qiita.ucsd.edu/study/description/1684 http://qiita.ucsd.edu/study/description/1689 http://qiita.ucsd.edu/study/description/1690 Need a final push: http://qiita.ucsd.edu/study/description/1685 http://qiita.ucsd.edu/study/description/1972 http://qiita.ucsd.edu/study/description/1973
I'm still interested in this data. Other people are too.
I'm also interested in contributing to the resource page or to a new github repo for this data set. Let me know how I can help.
@colinbrislawn, if you're available to issue a PR adding the links to the QIIME Resources page (content is here), that'd be fantastic!
Also, ping @nbokulich so he's aware of this.
Sure thing! Should I link to the studies on qiita, or on a FTP server, or in a github repo like I mentioned before?
I think Qiita is ideal, if @antgonza agrees. Otherwise FTP. The files are too big to make sense in a GitHub repo. Thank you!
On Fri, Feb 12, 2016 at 9:54 AM, Colin Brislawn [email protected] wrote:
Sure thing! Should I link to the studies on qiita, or on a FTP server, or in a github repo like I mentioned before?
— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2105#issuecomment-183411165.
I guess I kind of like hosting these on FTP and qiita, if possible.
The vsearch test data and the Mothur example data are really easy to download and this encourages reuse. While I use and love qiita, it's new and we can lower the barrier of entry with an FTP site. Could we host these files along with the other files on ftp://ftp.microbio.me
?
Are the files hosted in Qiita accessible through ftp @antgonza? I can't comment on ftp.microbiome.me - that's a Knight Lab resource.
Oh if we could hard link to the files in qiita, that would remove the barrier of entry without duplicating effort. That would be perfect!
Agree - that would be ideal.
Greg, didn't we copy all raw data to the taxa assignment github? Or still in the S3 bucket? I know we have these deposited somewhere outside of qiita already...
On Fri, Feb 12, 2016 at 9:18 AM, Colin Brislawn [email protected] wrote:
Oh if we could hard link to the files in qiita, that would remove the barrier of entry without duplicating effort. That would be perfect!
— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2105#issuecomment-183418780.
@nbokulich, thanks for the reminder. We did, and those links are here and other relevant data here. I thought we had these on S3, in which case we'd be paying for the data transfer and it's pretty expensive there, but these are all already on ftp.microbio.me. So, I think we're good to go, and we can link to these and to the Qiita studies.
All good @colinbrislawn?
I'm ready to start. Could you assign it to me?
I'll use ftp.microbio.me as much as possible, defaulting to the S3 links when needed. I'll also mention the qiita study IDs.
I'll do this in waves, starting with qiita and Bokulich, 2013
The original study mentions these data sets:
where data set number can be found in Supplementary Table 7: data set 1, 719; data set 2, 1685; data set 3, 1686; data set 4, 1626; data set 5, 1687; data set 6, 1688; data set 7, 1683; data set 8, 1684; data set 9, 1689; and data set 10, 1690.
From that list, these studies are missing from qiita: https://qiita.ucsd.edu/study/description/719 https://qiita.ucsd.edu/study/description/1626 Any ideas @antgonza? Maybe these were split into 721 and 722?
Like you mentioned, this one is not yet publicly available: https://qiita.ucsd.edu/study/description/1685
@colinbrislawn the ids that you are seeing in the original study have been kept in Qiita - so you just need to put the study id at the end of those links and you will have all those. @antgonza is working on getting all of them available through Qiita.
Good to know. Once the links are live I will add them post haste!
What study are 1972 and 1973 from? Those aren't mentioned in the nature paper. https://qiita.ucsd.edu/study/description/1973
1972 AN
On Fri, Feb 12, 2016 at 3:27 PM, Colin Brislawn [email protected] wrote:
Good to know. Once the links are live I will add them post haste!
What study are 1972 and 1973 from? Those aren't mentioned in the nature paper. https://qiita.ucsd.edu/study/description/1973
— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2105#issuecomment-183526771.
Sorry, finger slip hit send prematurely.
1972 and 1973 are from a study we are working on now. Unpublished but you should include these --- they are good 16S V4 mock community datasets. The ITS links listed will also be useful... included in the same study, which is described in this preprint https://peerj.com/preprints/934.pdf.
1626 actually = 1517 (it was given a different ID when ported to qiita... credit to @antgonza https://github.com/antgonza for previously uncovering this)
719 was split into 721 (5' reads) and 722 (3' reads). (credit again goes to @antgonza https://github.com/antgonza for sleuthing a few months ago when we had this problem)
NOTE: some of these are not actually mock communities. The following IDs are for natural communities that were analyzed in the 2013 paper: 1683 1684 1689 1690
On Fri, Feb 12, 2016 at 3:28 PM, Nicholas Bokulich [email protected] wrote:
1972 AN
On Fri, Feb 12, 2016 at 3:27 PM, Colin Brislawn [email protected] wrote:
Good to know. Once the links are live I will add them post haste!
What study are 1972 and 1973 from? Those aren't mentioned in the nature paper. https://qiita.ucsd.edu/study/description/1973
— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2105#issuecomment-183526771.
Thanks for the info! I've added in those links.
I may be a bit slow on this Friday, but I'm having trouble connecting samples on the ftp site with the Qiita studies mentioned in Supp Table 7.
Help!
Yeah... I think the links on the FTP site use a different nomenclature.
I dug up the attached document, which should clear things up: it gives the old and new names, the links in qiita and the FTP, and some data on the type of data.
Does that clear it up?
On Fri, Feb 12, 2016 at 4:06 PM, Colin Brislawn [email protected] wrote:
Thanks for the info! I've added in those links.
I may be a bit slow on this Friday, but I'm having trouble connecting samples on the ftp site with the Qiita studies mentioned in Supp Table 7. [image: screen shot 2016-02-12 at 3 56 32 pm] https://cloud.githubusercontent.com/assets/10355152/13023872/326f0f76-d1a2-11e5-892a-887b15ac6b22.png
Help!
— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2105#issuecomment-183537250.
Looks like the link won't attach. Here's the relevant text (or email me directly for the table): Eval Framework ID Nature Methods ID Eval Framework Link QIITA ID / Link B1 Data set 5 ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/S16S-1/ http://qiita.ucsd.edu/study/description/1687 B2 Data set 6 ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/S16S-2/ http://qiita.ucsd.edu/study/description/1688 B3 Data set 2 ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Broad1/ http://qiita.ucsd.edu/study/description/1685 B4 Data set 3 ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Broad2/ http://qiita.ucsd.edu/study/description/1686 B5 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Broad3/ 1972 B6 Data set 1 http://qiita.ucsd.edu/study/description/719 B7 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Turnbaugh1/ 1319 B8 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Turnbaugh2/ 1973 F1 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/RDBW/ 1974 F2 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/ITS_SAG/ 1975 NA Data set 4 ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/L18S-1/ http://qiita.ucsd.edu/study/description/1626
On Fri, Feb 12, 2016 at 4:14 PM, Nicholas Bokulich [email protected] wrote:
Yeah... I think the links on the FTP site use a different nomenclature.
I dug up the attached document, which should clear things up: it gives the old and new names, the links in qiita and the FTP, and some data on the type of data.
Does that clear it up?
On Fri, Feb 12, 2016 at 4:06 PM, Colin Brislawn [email protected] wrote:
Thanks for the info! I've added in those links.
I may be a bit slow on this Friday, but I'm having trouble connecting samples on the ftp site with the Qiita studies mentioned in Supp Table 7. [image: screen shot 2016-02-12 at 3 56 32 pm] https://cloud.githubusercontent.com/assets/10355152/13023872/326f0f76-d1a2-11e5-892a-887b15ac6b22.png
Help!
— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2105#issuecomment-183537250.
GitHub doesn't like FTPs.
Oh thanks! I'll take another shot at it.
I've got most of this wrapped up in a PR. With all your help, I'm really close!
I have a quick question: the following folders and associated qiita studies are never mentioned in the Bokulich paper. Are these the data sets from that peerj paper? How should I present these?
B5 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Broad3/
1972
B6 Data set 1
http://qiita.ucsd.edu/study/description/719
B7 NA
ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Turnbaugh1/
1319
B8 NA
ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Turnbaugh2/
1973
F1 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/RDBW/
1974
F2 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/ITS_SAG/
1975
Inversely, I don't have FTP links to these qiita studies.
1626 (now 1517), 1683, 1684, 1689, 1690, 719 (now 721 and 722)
Thanks for helping me construct this.
That's correct --- those studies not mentioned in the Nature methods paper are described in the peerJ preprint.
Studies 1683, 1684, 1689, and 1690 are NOT mock communities. They are natural communities (i.e., real samples) that we examined in the 2013 paper. Hence, these are not in the FTP and not relevant to your current needs.
Not sure why 719 isn't in the FTP. Think there was another outside link that we used for this, and hence didn't copy it. @gregcaporaso, this is the mock community from your 2011 PNAS paper... do you still have another link to these data?
The datasets on the FTP are those used in the peerJ preprint. 1626 (now 1517) we dropped from the peerJ preprint, since this is an 18S dataset. We wanted to focus on 16S and ITS. Hence, not in the FTP.
On Sat, Feb 13, 2016 at 9:35 PM, Colin Brislawn [email protected] wrote:
I've got most of this wrapped up in a PR. With all your help, I'm really close!
I have a quick question: the following folders and associated qiita studies are never mentioned in the Bokulich paper. Are these the data sets from that peerj paper? How should I present these?
B5 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Broad3/ 1972 B6 Data set 1http://qiita.ucsd.edu/study/description/719 B7 http://qiita.ucsd.edu/study/description/719B7 NAftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Turnbaugh1/ 1319 B8 NAftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Turnbaugh2/ 1973 F1 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/RDBW/ 1974 F2 NA ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/ITS_SAG/ 1975
Inversely, I don't have FTP links to these qiita studies.
1626 (now 1517), 1683, 1684, 1689, 1690, 719 (now 721 and 722)
Thanks for helping me construct this.
— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2105#issuecomment-183824126.
I was planning wait for the official publication to post the peerJ paper. Should I just post them now?
I'm pretty close to finishing the 2013 paper. I was hoping to add the 1683, 1684, 1689, and 1690 studies. While they are not mock communities, they are included in the paper. I guess I'll just use qiita links if they are not on the server...
I don't think there's any reason to wait for the peer-reviewed publication. I also don't think you should post 1683, 1684, 1689, or 1690 - since those are not mock communities it'll they just add some confusion to what this resource is. Let's keep it as just the mock communities.
On Sun, Feb 14, 2016 at 5:22 PM, Colin Brislawn [email protected] wrote:
I was planning wait for the official publication to post the peerJ paper. Should I just post them now?
I'm pretty close to finishing the 2013 paper. I was hoping to add the 1683, 1684, 1689, and 1690 studies. While they are not mock communities, they are included in the paper. I guess I'll just use qiita links if they are not on the server...
— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2105#issuecomment-184011020.
@nbokulich, aren't the Turnbaugh 1 sequences the one from my 2011 PNAS paper?
Yes, Turnbaugh 1 = your 2011 PNAS paper.
On Mon, Feb 15, 2016 at 4:41 AM, Greg Caporaso [email protected] wrote:
@nbokulich https://github.com/nbokulich, aren't the Turnbaugh 1 sequences the one from my 2011 PNAS paper?
— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2105#issuecomment-184190597.