MicrobeDB icon indicating copy to clipboard operation
MicrobeDB copied to clipboard

The -i paremeter doesn't work should be -incomplete

Open timothyjlaurent opened this issue 11 years ago • 5 comments

This confused someone in the lab (including me)

Thanks for the software

timothyjlaurent avatar Nov 05 '13 19:11 timothyjlaurent

Could you specify what script this is failing with?

I have just tested: ./download_load_and_delete_old_version.pl -d ~/tmp/ -s '-i'

and

./download_version.pl -d /home/mlangill/tmp/ -s Escherichia_coli -i

and they were both successful (note the -i has to be in quotes in the first case).

Glad MicrobeDB is working for you!

mlangill avatar Nov 06 '13 19:11 mlangill

On Wed, Nov 6, 2013 at 11:26 AM, Morgan Langille [email protected]:

./download_version.pl

-d /home/mlangill/tmp/ -s Escherichia_coli -i

ok so I guess it does work --- well almost.

laurentt@shattuck:~/MicrobeDB/scripts$ ./download_version.pl -d ~/tmp/ -s Escherichia_coli -o 2013/11/06 12:11:13> Using FTP to download since --search,--incomplete, and/or --only_incomplete option(s) selected. 2013/11/06 12:11:13> Downloading files to directory: /mnt/data/home/laurentt/tmp/ 2013/11/06 12:11:13> Note: NCBI has removed their metadata files. We are looking for another source of metadata information, but until then we can only use metadata for genomes fr om before June, 2012. 2013/11/06 12:11:13> Downloading file: /mnt/data/home/laurentt/tmp/NCBI_orginfo.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_orginfo.txt 2013/11/06 12:11:15> Downloading file: /mnt/data/home/laurentt/tmp/NCBI_completegenomes.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_completegenomes.txt 2013/11/06 12:11:17> Downloading genomes using FTP option 2013/11/06 12:11:17> Downloading draft genomes now. 2013/11/06 12:11:22> Found 683 draft genomes that matched the search: Escherichia_coli 2013/11/06 12:11:22> Downloading genome: Escherichia_coli_07798_uid181911 2013/11/06 12:11:23> Downloading genome: Escherichia_coli_09BKT078844_uid188354 2013/11/06 12:11:24> Downloading genome: Escherichia_coli_0_1288_uid181931 2013/11/06 12:11:26> Downloading genome: Escherichia_coli_0_1304_uid181932 2013/11/06 12:11:27> Downloading genome: Escherichia_coli_101_1_uid54363 2013/11/06 12:11:28> Downloading genome: Escherichia_coli_10_0821_uid180960 2013/11/06 12:11:30> Downloading genome: Escherichia_coli_10_0833_uid181884

So this looks promising ( I did the entire set of almost 7k last night) however in the directory there aren't any files :

laurentt@shattuck:~/tmp$ cd Escherichia_coli_07798_uid181911 laurentt@shattuck:~/tmp/Escherichia_coli_07798_uid181911$ ll total 28K drwxr-xr-x 2 laurentt lsd 4.0K Nov 6 12:11 ./ drwxr-xr-x 251 laurentt lsd 20K Nov 6 12:16 ../ -rw-r--r-- 1 laurentt lsd 1.4K Nov 6 12:11 .listing laurentt@shattuck:~/tmp/Escherichia_coli_07798_uid181911$ cat .listing -r--r--r-- 1 ftp anonymous 11422 Nov 29 2012 NZ_AMUP00000000.asn -r--r--r-- 1 ftp anonymous 2675 Dec 11 2012 NZ_AMUP00000000.gbk -r--r--r-- 1 ftp anonymous 261 Dec 29 2012 NZ_AMUP00000000.rpt -r--r--r-- 1 ftp anonymous 1600657 Apr 29 2013 NZ_AMUP00000000.scaffold.asn.tgz -r--r--r-- 1 ftp anonymous 986551 Apr 29 2013 NZ_AMUP00000000.scaffold.faa.tgz -r--r--r-- 1 ftp anonymous 1411297 Apr 29 2013 NZ_AMUP00000000.scaffold.ffn.tgz -r--r--r-- 1 ftp anonymous 1537698 Apr 29 2013 NZ_AMUP00000000.scaffold.fna.tgz -r--r--r-- 1 ftp anonymous 5473 Apr 29 2013 NZ_AMUP00000000.scaffold.frn.tgz -r--r--r-- 1 ftp anonymous 4373167 Apr 29 2013 NZ_AMUP00000000.scaffold.gbk.tgz -r--r--r-- 1 ftp anonymous 17831 Apr 29 2013 NZ_AMUP00000000.scaffold.gbs.tgz -r--r--r-- 1 ftp anonymous 275197 Apr 29 2013 NZ_AMUP00000000.scaffold.gff.tgz -r--r--r-- 1 ftp anonymous 140729 Apr 29 2013 NZ_AMUP00000000.scaffold.ptt.tgz -r--r--r-- 1 ftp anonymous 2608 Apr 29 2013 NZ_AMUP00000000.scaffold.rnt.tgz -r--r--r-- 1 ftp anonymous 7096 Apr 29 2013 NZ_AMUP00000000.scaffold.rpt.tgz -r--r--r-- 1 ftp anonymous 1445811 Apr 29 2013 NZ_AMUP00000000.scaffold.val.tgz -r--r--r-- 1 ftp anonymous 4372 Nov 29 2012 NZ_AMUP00000000.val

Do you have any idea why this is happening?

Thanks for your help, we' love to get this up and running.

BTW, Morgan, this is for Katie Pollard's Lab. I understand that you are a former iSEEM member -- We previously had an updater pipeline to mirror the JGI's IMG database, but they have since restricted headless access to the database.

One more thing of note -- below is the terminal output after lastnight's run, before I know that I didn't have the genomes. Notice that it says: "Downloading REFSEQ genomes now" and then there is no listing of the individual genomes before going on the the draft genomes.

aurentt@shattuck:~/MicrobeDB/scripts$ ./ download_load_and_delete_old_version.pl -d /mnt/data/work/pollardlab/MicrobeDB_test/ -s '-incomplete -t faa,fna,gbk,ffn' 2013/11/06 11:16:33> Making download directory: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06 2013/11/06 11:16:33> Downloading all genomes from NCBI.(Downloading time will vary depending on your connection and how flaky NCBI is today; (10 minutes to a few hours)) 2013/11/06 11:16:34> Using FTP to download since --search,--incomplete, and/or --only_incomplete option(s) selected. 2013/11/06 11:16:34> Downloading files to directory: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06/ 2013/11/06 11:16:34> Note: NCBI has removed their metadata files. We are looking for another source of metadata information, but until then we can only use metadata for genomes fr om before June, 2012. 2013/11/06 11:16:34> Downloading file: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06/NCBI_orginfo.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_orginfo. txt 2013/11/06 11:16:35> Downloading file: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06/NCBI_completegenomes.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_ completegenomes.txt 2013/11/06 11:16:37> Downloading genomes using FTP option 2013/11/06 11:16:37> Downloading RefSeq genomes now. 2013/11/06 11:30:26> Downloading draft genomes now. 2013/11/06 11:30:31> Downloading genome: Acaricomes_phytoseiuli_DSM_14247_uid199097 2013/11/06 11:30:38> Downloading genome: Acaryochloris_CCMEE_5410_uid78283

Anyways I'd be happy to help to fix this in any way I can to get it working for us. Please let me know things I could try or parts of the code that could be altered to get this working.

Best,

Timothy Laurent

timothyjlaurent avatar Nov 06 '13 20:11 timothyjlaurent

Oh I think I see now. All the sequences are in the tar.gz files. then I unpack them and add to database?

Ok that works!

Thanks for indulging me and for your nice software.

-Tim

On Wed, Nov 6, 2013 at 12:30 PM, Timothy Laurent [email protected]:

On Wed, Nov 6, 2013 at 11:26 AM, Morgan Langille <[email protected]

wrote:

./download_version.pl

-d /home/mlangill/tmp/ -s Escherichia_coli -i

ok so I guess it does work --- well almost.

laurentt@shattuck:~/MicrobeDB/scripts$ ./download_version.pl -d ~/tmp/ -s Escherichia_coli -o 2013/11/06 12:11:13> Using FTP to download since --search,--incomplete, and/or --only_incomplete option(s) selected. 2013/11/06 12:11:13> Downloading files to directory: /mnt/data/home/laurentt/tmp/ 2013/11/06 12:11:13> Note: NCBI has removed their metadata files. We are looking for another source of metadata information, but until then we can only use metadata for genomes fr om before June, 2012. 2013/11/06 12:11:13> Downloading file: /mnt/data/home/laurentt/tmp/NCBI_orginfo.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_orginfo.txt 2013/11/06 12:11:15> Downloading file: /mnt/data/home/laurentt/tmp/NCBI_completegenomes.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_completegenomes.txt 2013/11/06 12:11:17> Downloading genomes using FTP option 2013/11/06 12:11:17> Downloading draft genomes now. 2013/11/06 12:11:22> Found 683 draft genomes that matched the search: Escherichia_coli 2013/11/06 12:11:22> Downloading genome: Escherichia_coli_07798_uid181911 2013/11/06 12:11:23> Downloading genome: Escherichia_coli_09BKT078844_uid188354 2013/11/06 12:11:24> Downloading genome: Escherichia_coli_0_1288_uid181931 2013/11/06 12:11:26> Downloading genome: Escherichia_coli_0_1304_uid181932 2013/11/06 12:11:27> Downloading genome: Escherichia_coli_101_1_uid54363 2013/11/06 12:11:28> Downloading genome: Escherichia_coli_10_0821_uid180960 2013/11/06 12:11:30> Downloading genome: Escherichia_coli_10_0833_uid181884

So this looks promising ( I did the entire set of almost 7k last night) however in the directory there aren't any files :

laurentt@shattuck:~/tmp$ cd Escherichia_coli_07798_uid181911 laurentt@shattuck:~/tmp/Escherichia_coli_07798_uid181911$ ll total 28K drwxr-xr-x 2 laurentt lsd 4.0K Nov 6 12:11 ./ drwxr-xr-x 251 laurentt lsd 20K Nov 6 12:16 ../ -rw-r--r-- 1 laurentt lsd 1.4K Nov 6 12:11 .listing laurentt@shattuck:~/tmp/Escherichia_coli_07798_uid181911$ cat .listing -r--r--r-- 1 ftp anonymous 11422 Nov 29 2012 NZ_AMUP00000000.asn -r--r--r-- 1 ftp anonymous 2675 Dec 11 2012 NZ_AMUP00000000.gbk -r--r--r-- 1 ftp anonymous 261 Dec 29 2012 NZ_AMUP00000000.rpt -r--r--r-- 1 ftp anonymous 1600657 Apr 29 2013 NZ_AMUP00000000.scaffold.asn.tgz -r--r--r-- 1 ftp anonymous 986551 Apr 29 2013 NZ_AMUP00000000.scaffold.faa.tgz -r--r--r-- 1 ftp anonymous 1411297 Apr 29 2013 NZ_AMUP00000000.scaffold.ffn.tgz -r--r--r-- 1 ftp anonymous 1537698 Apr 29 2013 NZ_AMUP00000000.scaffold.fna.tgz -r--r--r-- 1 ftp anonymous 5473 Apr 29 2013 NZ_AMUP00000000.scaffold.frn.tgz -r--r--r-- 1 ftp anonymous 4373167 Apr 29 2013 NZ_AMUP00000000.scaffold.gbk.tgz -r--r--r-- 1 ftp anonymous 17831 Apr 29 2013 NZ_AMUP00000000.scaffold.gbs.tgz -r--r--r-- 1 ftp anonymous 275197 Apr 29 2013 NZ_AMUP00000000.scaffold.gff.tgz -r--r--r-- 1 ftp anonymous 140729 Apr 29 2013 NZ_AMUP00000000.scaffold.ptt.tgz -r--r--r-- 1 ftp anonymous 2608 Apr 29 2013 NZ_AMUP00000000.scaffold.rnt.tgz -r--r--r-- 1 ftp anonymous 7096 Apr 29 2013 NZ_AMUP00000000.scaffold.rpt.tgz -r--r--r-- 1 ftp anonymous 1445811 Apr 29 2013 NZ_AMUP00000000.scaffold.val.tgz -r--r--r-- 1 ftp anonymous 4372 Nov 29 2012 NZ_AMUP00000000.val

Do you have any idea why this is happening?

Thanks for your help, we' love to get this up and running.

BTW, Morgan, this is for Katie Pollard's Lab. I understand that you are a former iSEEM member -- We previously had an updater pipeline to mirror the JGI's IMG database, but they have since restricted headless access to the database.

One more thing of note -- below is the terminal output after lastnight's run, before I know that I didn't have the genomes. Notice that it says: "Downloading REFSEQ genomes now" and then there is no listing of the individual genomes before going on the the draft genomes.

aurentt@shattuck:~/MicrobeDB/scripts$ ./ download_load_and_delete_old_version.pl -d /mnt/data/work/pollardlab/MicrobeDB_test/ -s '-incomplete -t faa,fna,gbk,ffn' 2013/11/06 11:16:33> Making download directory: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06 2013/11/06 11:16:33> Downloading all genomes from NCBI.(Downloading time will vary depending on your connection and how flaky NCBI is today; (10 minutes to a few hours)) 2013/11/06 11:16:34> Using FTP to download since --search,--incomplete, and/or --only_incomplete option(s) selected. 2013/11/06 11:16:34> Downloading files to directory: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06/ 2013/11/06 11:16:34> Note: NCBI has removed their metadata files. We are looking for another source of metadata information, but until then we can only use metadata for genomes fr om before June, 2012. 2013/11/06 11:16:34> Downloading file: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06/NCBI_orginfo.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_orginfo. txt 2013/11/06 11:16:35> Downloading file: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06/NCBI_completegenomes.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_ completegenomes.txt 2013/11/06 11:16:37> Downloading genomes using FTP option 2013/11/06 11:16:37> Downloading RefSeq genomes now. 2013/11/06 11:30:26> Downloading draft genomes now. 2013/11/06 11:30:31> Downloading genome: Acaricomes_phytoseiuli_DSM_14247_uid199097 2013/11/06 11:30:38> Downloading genome: Acaryochloris_CCMEE_5410_uid78283

Anyways I'd be happy to help to fix this in any way I can to get it working for us. Please let me know things I could try or parts of the code that could be altered to get this working.

Best,

Timothy Laurent

Timothy Laurent

timothyjlaurent avatar Nov 06 '13 20:11 timothyjlaurent

Ok I'm still confused -- I'm not getting the genome files I want in the directories. I am running some controlled tests now and will report back when I have results.

On Wed, Nov 6, 2013 at 12:43 PM, Timothy Laurent [email protected]:

Oh I think I see now. All the sequences are in the tar.gz files. then I unpack them and add to database?

Ok that works!

Thanks for indulging me and for your nice software.

-Tim

On Wed, Nov 6, 2013 at 12:30 PM, Timothy Laurent < [email protected]> wrote:

On Wed, Nov 6, 2013 at 11:26 AM, Morgan Langille < [email protected]> wrote:

./download_version.pl

-d /home/mlangill/tmp/ -s Escherichia_coli -i

ok so I guess it does work --- well almost.

laurentt@shattuck:~/MicrobeDB/scripts$ ./download_version.pl -d ~/tmp/ -s Escherichia_coli -o 2013/11/06 12:11:13> Using FTP to download since --search,--incomplete, and/or --only_incomplete option(s) selected. 2013/11/06 12:11:13> Downloading files to directory: /mnt/data/home/laurentt/tmp/ 2013/11/06 12:11:13> Note: NCBI has removed their metadata files. We are looking for another source of metadata information, but until then we can only use metadata for genomes fr om before June, 2012. 2013/11/06 12:11:13> Downloading file: /mnt/data/home/laurentt/tmp/NCBI_orginfo.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_orginfo.txt 2013/11/06 12:11:15> Downloading file: /mnt/data/home/laurentt/tmp/NCBI_completegenomes.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_completegenomes.txt 2013/11/06 12:11:17> Downloading genomes using FTP option 2013/11/06 12:11:17> Downloading draft genomes now. 2013/11/06 12:11:22> Found 683 draft genomes that matched the search: Escherichia_coli 2013/11/06 12:11:22> Downloading genome: Escherichia_coli_07798_uid181911 2013/11/06 12:11:23> Downloading genome: Escherichia_coli_09BKT078844_uid188354 2013/11/06 12:11:24> Downloading genome: Escherichia_coli_0_1288_uid181931 2013/11/06 12:11:26> Downloading genome: Escherichia_coli_0_1304_uid181932 2013/11/06 12:11:27> Downloading genome: Escherichia_coli_101_1_uid54363 2013/11/06 12:11:28> Downloading genome: Escherichia_coli_10_0821_uid180960 2013/11/06 12:11:30> Downloading genome: Escherichia_coli_10_0833_uid181884

So this looks promising ( I did the entire set of almost 7k last night) however in the directory there aren't any files :

laurentt@shattuck:~/tmp$ cd Escherichia_coli_07798_uid181911 laurentt@shattuck:~/tmp/Escherichia_coli_07798_uid181911$ ll total 28K drwxr-xr-x 2 laurentt lsd 4.0K Nov 6 12:11 ./ drwxr-xr-x 251 laurentt lsd 20K Nov 6 12:16 ../ -rw-r--r-- 1 laurentt lsd 1.4K Nov 6 12:11 .listing laurentt@shattuck:~/tmp/Escherichia_coli_07798_uid181911$ cat .listing -r--r--r-- 1 ftp anonymous 11422 Nov 29 2012 NZ_AMUP00000000.asn -r--r--r-- 1 ftp anonymous 2675 Dec 11 2012 NZ_AMUP00000000.gbk -r--r--r-- 1 ftp anonymous 261 Dec 29 2012 NZ_AMUP00000000.rpt -r--r--r-- 1 ftp anonymous 1600657 Apr 29 2013 NZ_AMUP00000000.scaffold.asn.tgz -r--r--r-- 1 ftp anonymous 986551 Apr 29 2013 NZ_AMUP00000000.scaffold.faa.tgz -r--r--r-- 1 ftp anonymous 1411297 Apr 29 2013 NZ_AMUP00000000.scaffold.ffn.tgz -r--r--r-- 1 ftp anonymous 1537698 Apr 29 2013 NZ_AMUP00000000.scaffold.fna.tgz -r--r--r-- 1 ftp anonymous 5473 Apr 29 2013 NZ_AMUP00000000.scaffold.frn.tgz -r--r--r-- 1 ftp anonymous 4373167 Apr 29 2013 NZ_AMUP00000000.scaffold.gbk.tgz -r--r--r-- 1 ftp anonymous 17831 Apr 29 2013 NZ_AMUP00000000.scaffold.gbs.tgz -r--r--r-- 1 ftp anonymous 275197 Apr 29 2013 NZ_AMUP00000000.scaffold.gff.tgz -r--r--r-- 1 ftp anonymous 140729 Apr 29 2013 NZ_AMUP00000000.scaffold.ptt.tgz -r--r--r-- 1 ftp anonymous 2608 Apr 29 2013 NZ_AMUP00000000.scaffold.rnt.tgz -r--r--r-- 1 ftp anonymous 7096 Apr 29 2013 NZ_AMUP00000000.scaffold.rpt.tgz -r--r--r-- 1 ftp anonymous 1445811 Apr 29 2013 NZ_AMUP00000000.scaffold.val.tgz -r--r--r-- 1 ftp anonymous 4372 Nov 29 2012 NZ_AMUP00000000.val

Do you have any idea why this is happening?

Thanks for your help, we' love to get this up and running.

BTW, Morgan, this is for Katie Pollard's Lab. I understand that you are a former iSEEM member -- We previously had an updater pipeline to mirror the JGI's IMG database, but they have since restricted headless access to the database.

One more thing of note -- below is the terminal output after lastnight's run, before I know that I didn't have the genomes. Notice that it says: "Downloading REFSEQ genomes now" and then there is no listing of the individual genomes before going on the the draft genomes.

aurentt@shattuck:~/MicrobeDB/scripts$ ./ download_load_and_delete_old_version.pl -d /mnt/data/work/pollardlab/MicrobeDB_test/ -s '-incomplete -t faa,fna,gbk,ffn' 2013/11/06 11:16:33> Making download directory: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06 2013/11/06 11:16:33> Downloading all genomes from NCBI.(Downloading time will vary depending on your connection and how flaky NCBI is today; (10 minutes to a few hours)) 2013/11/06 11:16:34> Using FTP to download since --search,--incomplete, and/or --only_incomplete option(s) selected. 2013/11/06 11:16:34> Downloading files to directory: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06/ 2013/11/06 11:16:34> Note: NCBI has removed their metadata files. We are looking for another source of metadata information, but until then we can only use metadata for genomes fr om before June, 2012. 2013/11/06 11:16:34> Downloading file: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06/NCBI_orginfo.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_orginfo. txt 2013/11/06 11:16:35> Downloading file: /mnt/data/work/pollardlab/MicrobeDB_test/Bacteria_2013-11-06/NCBI_completegenomes.txt from Dropbox at: http://dl.dropbox.com/u/5329340/NCBI_ completegenomes.txt 2013/11/06 11:16:37> Downloading genomes using FTP option 2013/11/06 11:16:37> Downloading RefSeq genomes now. 2013/11/06 11:30:26> Downloading draft genomes now. 2013/11/06 11:30:31> Downloading genome: Acaricomes_phytoseiuli_DSM_14247_uid199097 2013/11/06 11:30:38> Downloading genome: Acaryochloris_CCMEE_5410_uid78283

Anyways I'd be happy to help to fix this in any way I can to get it working for us. Please let me know things I could try or parts of the code that could be altered to get this working.

Best,

Timothy Laurent

Timothy Laurent

Timothy Laurent

timothyjlaurent avatar Nov 08 '13 22:11 timothyjlaurent

Ok I have retested the download everything with the following command :+1: $ ./download_load_and_delete_old_version.pl -d /mnt/data/work/pollardlab/MicrobeDBtests/dladov -s '-i -t faa,fna,ffn' 2>&1 | tee -a download_load_and_delete_old_version.log

This makes 9671 folders in my download location. of these 6946 are empty with only the .listing file.

in the log of the output there were many occasions where "couldn't find the genbank file"

in the database there are only :+1: select count(*) from taxonomy; 2665

So ... is there any way to get the rest of the genomes? What is going wrong here? How can we fix it?

timothyjlaurent avatar Nov 11 '13 20:11 timothyjlaurent