anvio
anvio copied to clipboard
Issue with new anvi-setup-ncbi-cogs
anvi-setup-ncbi-cogs gets stuck
Hi guys, it's me again. I'm trying to use the new awesome COG20 but got problem with anvi-setup-ncbi-cogs. Since I had problems with the automathic download, I downloaded with wget the files from NCBI COG database, selected the folder with --cog-data-dir but it get stuck at the BLAST search db, giving back the prompt, without saying anything.. What else can I try? Thanks
:: anvi'o v7 :: /share/Groups/Pathology >>> anvi-setup-ncbi-cogs --cog-version COG20 --cog-data-dir ./COG-DATA-DIR -T 16 --just-do-it
COG version ..................................: COG20
COG data source ..............................: The command line parameter.
COG base directory ...........................: /share/Groups/Pathology/COG-DATA-DIR
warning
===============================================
This program will first check whether you have all the raw files, and then will
attempt to regenerate everything that is necessary from them.
Diamond log ..................................: /share/Groups/Pathology/COG-DATA-DIR/COG20/DB_DIAMOND/log.txt
Diamond search db ............................: /share/Groups/Pathology/COG-DATA-DIR/COG20/DB_DIAMOND/COG.dmnd
BLAST log ....................................: /share/Groups/Pathology/COG-DATA-DIR/COG20/DB_BLAST/log.txt
BLAST search db ..............................: /tmp/tmp0iyq1zm5
anvi'o version
:: anvi'o v7 :: /share/Groups/Pathology >>> anvi-self-test --version
Anvi'o .......................................: hope (v7)
Profile database .............................: 35
Contigs database .............................: 20
Pan database .................................: 14
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 2
tRNA-seq database ............................: 1
System info
I downloaded anvio docker container on owr server running Centos7.
Hey @Sirbius, let's see if we can figure out your problem :) It looks like the setup script was able to find the raw files you downloaded, which is great.
Can you possibly run this command again with the --debug
flag and let me know what you see in the output?
Also, after this program gives back the prompt, if you look in share/Groups/Pathology/COG-DATA-DIR/COG20/DB_BLAST/COG/
, do you see several blast database files like COG.fa.00.phr
, COG.fa.00.pin
, etc? And what does it say in share/Groups/Pathology/COG-DATA-DIR/COG20/DB_BLAST/log.txt
?
A side note for anyone else from the public who wants to use this hack to sidestep the automatic download:
the COG setup script expects the raw NCBI data to be in a specific folder called 'RAW_DATA_FROM_NCBI' within your --cog-data-dir
folder. That means that after downloading the NCBI files with wget
, you should move them into this directory structure, kind of like this:
cd COG-DATA-DIR
mkdir RAW_DATA_FROM_NCBI
mv cog-20.cog.csv RAW_DATA_FROM_NCBI
mv cog-20.def.tab RAW_DATA_FROM_NCBI
mv fun-20.tab RAW_DATA_FROM_NCBI
mv cog-20.fa.gz RAW_DATA_FROM_NCBI
Otherwise anvi'o will not be able to find them. (the four files in the mv
commands are the ones that anvi'o expects for COG20.)
(Also @Sirbius - if you don't mind saying, what sort of issues are you having with the automatic download? If it is not a server-specific problem, we may be able to solve it)
Hi @ivagljiva, thank you for your reply. I have to say, I only need to download automatically only the fa.gz file, there is no problem with the others and they are automatically downloaded in the proper folder, inside RAW_DATA_FROM_NCBI. This is the output with the --debug option, but I guess it's not so informative..
`:: anvi'o v7 :: /home/silviat/Pathology >>> anvi-setup-ncbi-cogs --cog-version COG20 --cog-data-dir ./COG-DATA-DIR -T 16 --debug
COG version ..................................: COG20
COG data source ..............................: The command line parameter.
COG base directory ...........................: /home/silviat/Pathology/COG-DATA-DIR
WARNING
===============================================
This program will first check whether you have all the raw files, and then will
attempt to regenerate everything that is necessary from them.
Press ENTER to continue, or press CTRL + C to cancel...
Diamond log ..................................: /home/silviat/Pathology/COG-DATA-DIR/COG20/DB_DIAMOND/log.txt
[DEBUG] `run_command` is running .............: diamond makedb --in /tmp/tmprubo0vms -d
/home/silviat/Pathology/COG-DATA-DIR/COG20/DB_DIAMOND/COG -p 16
Diamond search db ............................: /home/silviat/Pathology/COG-DATA-DIR/COG20/DB_DIAMOND/COG.dmnd
BLAST log ....................................: /home/silviat/Pathology/COG-DATA-DIR/COG20/DB_BLAST/log.txt
[DEBUG] `run_command` is running .............: makeblastdb -in /tmp/tmprubo0vms -dbtype prot -out
/home/silviat/Pathology/COG-DATA-DIR/COG20/DB_BLAST/COG/COG.fa
BLAST search db ..............................: /tmp/tmprubo0vms
` This is the content of COG-DATA-DIR:
:: anvi'o v7 :: /home/silviat/Pathology/COG-DATA-DIR/COG20 >>> ls
CATEGORIES.txt COG.txt DB_BLAST DB_DIAMOND MISSING_COG_IDs.cPickle PID-TO-CID.cPickle RAW_DATA_FROM_NCBI
And this is the content of RAW_DATA_FROM_NCBI.
cog-20.cog.csv cog-20.def.tab cog-20.fa.gz fun-20.tab
And there is no DB_BLAST folder!
When I run the automatic download I get this error:
anvi-setup-ncbi-cogs --cog-version COG20 -T 16 --debug
COG version ..................................: COG20
COG data source ..............................: The anvi'o default.
COG base directory ...........................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG
WARNING
===============================================
This program will first check whether you have all the raw files, and then will
attempt to regenerate everything that is necessary from them.
Press ENTER to continue, or press CTRL + C to cancel...
Downloaded successfully ......................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/cog-20.cog.csv
Downloaded successfully ......................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/cog-20.def.tab
Downloaded successfully ......................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/fun-20.tab
Downloaded successfully ......................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/cog-20.fa.gz
Traceback for debugging
================================================================================
File "/opt/conda/envs/anvioenv/bin/anvi-setup-ncbi-cogs", line 47, in <module>
setup.create()
File "/opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/cogs.py", line 617, in create
self.setup_raw_data()
File "/opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/cogs.py", line 831, in setup_raw_data
self.files[file_name]['func'](file_path, J(self.COG_data_dir, self.files[file_name]['formatted_file_name']))
File "/opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/cogs.py", line 757, in format_protein_db
raise ConfigError(f"Something went wrong while decompressing the downloaded file :/ It is likely that "
================================================================================
Config Error: Something went wrong while decompressing the downloaded file :/ It is likely
that the download failed and only part of the file was downloaded. If you would
like to try again, please run the setup command with the flag `--reset`. Here is
what the downstream library said: 'Error -3 while decompressing data: invalid
code lengths set'.
And infact, the fa.gz file is not of the expected size (616MB)
ls -lh /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/
total 336M
-rw-r--r-- 1 root root 334M Jan 22 18:48 cog-20.cog.csv
-rw-r--r-- 1 root root 364K Jan 22 18:48 cog-20.def.tab
-rw-r--r-- 1 root root 924K Jan 22 18:49 cog-20.fa.gz
-rw-r--r-- 1 root root 1.2K Jan 22 18:48 fun-20.tab
That's why I also tried to download the cog-20.fa.gz directly in the automatic folder /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/ and run again the setup, which got stuck at the same point as above.
I would like to add that yesterday at some point I got a different error, which unfortunately I did not save but I can retrieve part of it from my browser history, when I tried to understand it.
File "/opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/cogs.py", line 704, in format_cog_names COG, category,
function, nn, pathway, pubmed_id, PDB_id = line.strip('\n').split('\t') ValueError: too many values to unpack (expected 7)
Here is what the downstream library said: 'Error -3 while decompressing data: invalid code lengths set'.
I thought that maybe the new COG20 file format was different than the 2014 version, like more columns than expected, but after checking the files and also the cog.py script I thought everything was fine. Another incredibile thing is that COG-DATA-DIR/COG20/ folder downloaded yesterday this morning was named COG-DATA-DIR/COG14/ !!! I got some ghosts in the server room I guess :P
I think the best solution here is to delete everything under COG via
rm -rf /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/*
(while making extra sure there is no space between *
and /
)
And try again. The original file seems to be broken.
Hi everyone,
I have the very same problem. I've been trying to run anvi-setup-ncbi cogs for 2 days now with more or less the same outputs than Sirbius... I also tried to download the cog database myself with same results.
Sorry to bother... I rerun several times and at the end it worked... I don't know why it did not work yesterday and it worked today to be honest
Sorry to hear, @lvelosuarez, but I'm glad it worked eventually :/ Because we insist on using upstream data rather than storing it in our distribution, server connectivity issues between you and the upstream sometimes results in incomplete downloads, and anvi'o doesn't realize it's been a very long time and should simply try again from scratch.
Random developer Idea: It would've been excellent to see if we can set a timeout parameter to our downloader.
Ok, I've been trying to delete and re-run anvi-setup-ncbi-cogs and I always get this error:
`anvi-setup-ncbi-cogs --cog-version COG20 -T 16 --debug --just-do-it
COG version ..................................: COG20
COG data source ..............................: The anvi'o default.
COG base directory ...........................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG
WARNING
===============================================
This program will first check whether you have all the raw files, and then will
attempt to regenerate everything that is necessary from them.
Downloaded successfully ......................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/cog-20.cog.csv
Downloaded successfully ......................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/cog-20.def.tab
Downloaded successfully ......................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/fun-20.tab
Downloaded successfully ......................: /opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/cog-20.fa.gz
[23 Jan 21 11:37:31 Formatting protein ids to COG ids file] 95.55% ETA: NoneTraceback (most recent call last):
File "/opt/conda/envs/anvioenv/bin/anvi-setup-ncbi-cogs", line 47, in <module>
setup.create()
File "/opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/cogs.py", line 617, in create
self.setup_raw_data()
File "/opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/cogs.py", line 831, in setup_raw_data
self.files[file_name]['func'](file_path, J(self.COG_data_dir, self.files[file_name]['formatted_file_name']))
File "/opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/cogs.py", line 659, in format_p_id_to_cog_id_cPickle
p_id = fields[2].replace('.', '_')
IndexError: list index out of range
I also think that sometimes it just randomly works. I'll try again from the office with a better network connection, otherwise I'll just stick to COG14 :(
Hi guys, Just FYI. After today random run, I found the BLAST_DB/ and DIAMOND_DB/ inside the COG20/ and could read the log.txt (same as --debug option I guess).
# DATE: 24 Jan 21 10:54:34
# CMD LINE: diamond makedb --in /tmp/tmpr9z384cn -d /home/silviat/Pathology/COG-DATA-DIR/COG20/DB_DIAMOND/COG -p 16
diamond v2.0.6.144 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database input file: /tmp/tmpr9z384cn
Opening the database file... [0.011s]
Loading sequences... [4.949s]
Masking sequences... [4.569s]
Writing sequences... [11.918s]
Hashing sequences... [0.306s]
Loading sequences... [0.732s]
Masking sequences... [0.658s]
Writing sequences... [1.323s]
Hashing sequences... [0.049s]
Loading sequences... [0.001s]
Writing trailer... [0.671s]
Closing the input file... [0.001s]
Closing the database file... [0.419s]
Database hash = 84f947b4825b1bf8eee04e8d019f368b
Processed 3213025 sequences, 1150770183 letters.
Total time = 25.632s
[24 Jan 21 10:55:00] diamond makedb cmd ...........................: diamond makedb --in
/tmp/tmpr9z384cn -d
/home/silviat/Pathology/COG-DATA-DIR/COG20/DB_DIAMOND/COG
-p 16
[24 Jan 21 10:55:00] Diamond search db ............................: /home/silviat/Pathology/COG-DATA-DIR/COG20/DB_DIAMOND/COG.dmnd
Since the script stops when creating the database, I thought I could just build up everything on my own by downloading the files and then create the db with the below commands, which perfectly worked. Could you tell me what else the script is supposed to run?
diamond makedb --in cog-20.fa -d /home/silviat/Pathology/COG-DATA-DIR/COG20/DB_DIAMOND/COG -p 32
makeblastdb -in cog-20.fa -dbtype prot -out /home/silviat/Pathology/COG-DATA-DIR/COG20/DB_BLAST/COG/COG.fa`
```
But when I run anvi-run-ncbi-cogs --cog-version COG20 it says I have only COG14! And guess what, I found the ghost changing the folder name!
```
`:: anvi'o v7 :: /home/silviat/Pathology/COG-DATA-DIR >>> ls -R
.:
COG20
./COG20:
COG14
./COG20/COG14:
CATEGORIES.txt DB_BLAST MISSING_COG_IDs.cPickle RAW_DATA_FROM_NCBI
COG.txt DB_DIAMOND PID-TO-CID.cPickle
./COG20/COG14/DB_BLAST:
COG log.txt
./COG20/COG14/DB_BLAST/COG:
COG.fa.00.phr COG.fa.00.psq COG.fa.01.pin COG.fa.pal COG.fa.pot COG.fa.pto
COG.fa.00.pin COG.fa.01.phr COG.fa.01.psq COG.fa.pdb COG.fa.ptf
./COG20/COG14/DB_DIAMOND:
COG.dmnd log.txt
./COG20/COG14/RAW_DATA_FROM_NCBI:
cog-20.cog.csv cog-20.def.tab cog-20.fa fun-20.tab`
```
guess what, I found the ghost changing the folder name!
So how is this happening again? This should never happen:
./COG20:
COG14
Both COG14
and COG20
should be underneath the directory COG-DATA-DIR/
. Perhaps there is a problem with user-specified directories :/ I will look into this now.
Could you tell me what else the script is supposed to run?
The script runs a lot of other things to ensure integrity between files. It is not possible to do it manually :(
Nope. It's not about that either. I was able to setup both versions of COG without any problem in separate directories underneath a user-defined path:
>>> anvi-setup-ncbi-cogs --cog-data-dir COGS-DATA-DIR -T 4 --just-do-it
COG version ..................................: COG20
COG data source ..............................: The command line parameter.
COG base directory ...........................: /Users/meren/github/anvio/COGS-DATA-DIR
WARNING
===============================================
This program will first check whether you have all the raw files, and then will
attempt to regenerate everything that is necessary from them.
Downloaded successfully ......................: /Users/meren/github/anvio/COGS-DATA-DIR/COG20/RAW_DATA_FROM_NCBI/cog-20.cog.csv
Downloaded successfully ......................: /Users/meren/github/anvio/COGS-DATA-DIR/COG20/RAW_DATA_FROM_NCBI/cog-20.def.tab
Downloaded successfully ......................: /Users/meren/github/anvio/COGS-DATA-DIR/COG20/RAW_DATA_FROM_NCBI/fun-20.tab
Downloaded successfully ......................: /Users/meren/github/anvio/COGS-DATA-DIR/COG20/RAW_DATA_FROM_NCBI/cog-20.fa.gz
Diamond log ..................................: /Users/meren/github/anvio/COGS-DATA-DIR/COG20/DB_DIAMOND/log.txt
Diamond search db ............................: /Users/meren/github/anvio/COGS-DATA-DIR/COG20/DB_DIAMOND/COG.dmnd
BLAST log ....................................: /Users/meren/github/anvio/COGS-DATA-DIR/COG20/DB_BLAST/log.txt
BLAST search db ..............................: /var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmpsermdulq
>>> anvi-setup-ncbi-cogs --cog-data-dir COGS-DATA-DIR -T 4 --just-do-it --cog-version COG14
COG version ..................................: COG14
COG data source ..............................: The command line parameter.
COG base directory ...........................: /Users/meren/github/anvio/COGS-DATA-DIR
WARNING
===============================================
This program will first check whether you have all the raw files, and then will
attempt to regenerate everything that is necessary from them.
Downloaded successfully ......................: /Users/meren/github/anvio/COGS-DATA-DIR/COG14/RAW_DATA_FROM_NCBI/cog2003-2014.csv
Downloaded successfully ......................: /Users/meren/github/anvio/COGS-DATA-DIR/COG14/RAW_DATA_FROM_NCBI/cognames2003-2014.tab
Downloaded successfully ......................: /Users/meren/github/anvio/COGS-DATA-DIR/COG14/RAW_DATA_FROM_NCBI/fun2003-2014.tab
Downloaded successfully ......................: /Users/meren/github/anvio/COGS-DATA-DIR/COG14/RAW_DATA_FROM_NCBI/prot2003-2014.fa.gz
Diamond log ..................................: /Users/meren/github/anvio/COGS-DATA-DIR/COG14/DB_DIAMOND/log.txt
Diamond search db ............................: /Users/meren/github/anvio/COGS-DATA-DIR/COG14/DB_DIAMOND/COG.dmnd
BLAST log ....................................: /Users/meren/github/anvio/COGS-DATA-DIR/COG14/DB_BLAST/log.txt
BLAST search db ..............................: /var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmppsge8nbs
>>> ls COGS-DATA-DIR/
COG14 COG20
>>> ls COGS-DATA-DIR/COG14/
CATEGORIES.txt COG.txt DB_BLAST DB_DIAMOND MISSING_COG_IDs.cPickle PID-TO-CID.cPickle RAW_DATA_FROM_NCBI
>>> ls COGS-DATA-DIR/COG20/
CATEGORIES.txt COG.txt DB_BLAST DB_DIAMOND MISSING_COG_IDs.cPickle PID-TO-CID.cPickle RAW_DATA_FROM_NCBI
this file anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/cog-20.cog.csv has this line CTC_RS10785,GCF_000007625.1,WP_035109085.1,876,303-876,574,COG0749,COG0749,1,570.0,1.0e-200,593,31-593 SE133-174 AT984_RS20530,GCF_001477625.1,WP_082680220.1,384,1-121,121,COG0745,COG0745,3,112.0,3.14e-29,229,2-116
that makes the code break. the file on ftp://ftp.ncbi.nih.gov/pub/COG/COG2020/data doesnot have this id. i guess as a solution user can do 2 things
- wget directly from ftp
- update the code anvio/cogs.py to not break if the array index is not present.
Hi @maziz2,
This looks like an issue specific to your download. When I look at my file, this is what I see:
grep -A 2 CTC_RS10785,GCF_000007625.1,WP_035109085.1,876,303-876,574,COG0749,COG0749,1,570.0,1.0e-200,593,31-593 anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI/cog-20.cog.csv
CTC_RS10785,GCF_000007625.1,WP_035109085.1,876,303-876,574,COG0749,COG0749,1,570.0,1.0e-200,593,31-593
SE1367,GCF_000007645.1,NP_764922.1,903,323-903,581,COG0749,COG0749,1,632.0,1.0e-200,593,2-593
CV_RS03810,GCF_000007705.1,WP_011134334.1,928,311-928,618,COG0749,COG0749,1,773.0,1.0e-200,593,2-593
Probably the file was corrupted during download and should be fixed if you re-run the program with the --reset
flag.
Please let us know if you try that and succeed.
Best,
Hi Dr. Eren
im not sure where my last post went.. anyways.. yes this is a corrupted download issue hence the reason why the users are successful after multiple tries.. wget from NCBI ftp didnt work for me .. its kept downloading corrupted files . I tried the rsync method which worked beautifully rsync --copy-links --times --verbose rsync://ftp.ncbi.nlm.nih.gov/etc etc. RAW_DATA_FROM_NCBI/
That's very interesting, @maziz2. Thank you very much for the heads up.
@maziz2, @meren As you mentioned I have downloaded the file (shows error all the time) by using the following command rsync --copy-links --times --verbose rsync://ftp.ncbi.nlm.nih.gov/pub/COG/COG2020/data/cog-20.fa.gz /home/ga214/miniconda3/envs/anvio-7/lib/python3.6/site-packages/anvio/data/misc/COG/COG20/RAW_DATA_FROM_NCBI
But, I do not know how to format the downloaded files. please help me in this cause.
You dont need to format. Anvio does it all.. The next step is to run the setup anvi-setup-ncbi-cogs --cog-version COG20 -T 8 --debug
@maziz2 Thank you for your time and help. I followed your instruction but end-up with the following error,
(anvio-7) ga214@ga:~$ anvi-setup-ncbi-cogs --cog-version COG20 -T 14 --debug
COG version ..................................: COG20
COG data source ..............................: The anvi'o default.
COG base directory ...........................: /home/ga214/miniconda3/envs/anvio-7/lib/python3.6/site-packages/anvio/data/misc/COG
WARNING
===============================================
This program will first check whether you have all the raw files, and then will
attempt to regenerate everything that is necessary from them.
Press ENTER to continue, or press CTRL + C to cancel...
Traceback (most recent call last):
File "/home/ga214/miniconda3/envs/anvio-7/bin/anvi-setup-ncbi-cogs", line 47, in <module>
setup.create()
File "/home/ga214/miniconda3/envs/anvio-7/lib/python3.6/site-packages/anvio/cogs.py", line 617, in create
self.setup_raw_data()
File "/home/ga214/miniconda3/envs/anvio-7/lib/python3.6/site-packages/anvio/cogs.py", line 831, in setup_raw_data
self.files[file_name]['func'](file_path, J(self.COG_data_dir, self.files[file_name]['formatted_file_name']))
File "/home/ga214/miniconda3/envs/anvio-7/lib/python3.6/site-packages/anvio/cogs.py", line 659, in format_p_id_to_cog_id_cPickle
p_id = fields[2].replace('.', '_')
IndexError: list index out of range
I also tried this and got another error,
(anvio-7) ga214@ga:~$ anvi-setup-ncbi-cogs --cog-version /home/ga214/miniconda3/envs/anvio-7/lib/python3.6/site-packages/anvio/data/misc/COG/COG20 --debug
Traceback for debugging
================================================================================
File "/home/ga214/miniconda3/envs/anvio-7/bin/anvi-setup-ncbi-cogs", line 46, in <module>
setup = COGsSetup(args)
File "/home/ga214/miniconda3/envs/anvio-7/lib/python3.6/site-packages/anvio/cogs.py", line 513, in __init__
raise ConfigError(f"The COG versions known to anvi'o do not include '{self.COG_version}' :/ This is "
================================================================================
Config Error: The COG versions known to anvi'o do not include
'/home/ga214/miniconda3/envs/anvio-7/lib/python3.6/site-
packages/anvio/data/misc/COG/COG20' :/ This is what we know of: COG14, COG20.
This is one of those things that should have never happened. We salute you.
Could you please help me in this regard.
@dineshkumarsrk and others: if you are willing to help with this error you can switch to the active branch (explained here), run,
anvi-setup-ncbi-cogs --reset
And follow the instructions in the error message.
@dineshkumarsrk i requested NCBI to generate a checksum for all their files in COG folder https://ftp.ncbi.nlm.nih.gov/pub/COG/COG2020/data/checksums.md5.txt Please generate a checksum for the cog-20.fa.gz you downloaded and see if yours matches whats in the file
@meren it will be great if cogs.py downloads and matches the checksums before processing cog-20.fa.gz. i would have updated the codebase myself but im taking this machine learning course that is killing me insert exploding head with tears. i wont be able to test it thoroughly
Hello,
I have been trying to set up a COG20 database using docker (the latest version of ANVIO). The problem was that "formatting protein ids to COG ids" was terminated about 80% of the process, as shown below. I am wondering what I should do to fix this problem.
Thank you very much,
Siripong
An error "formatting protein ids to COG ids" step:
**:: anvi'o v7.1_main_0522 :: /Users/siripongtongjai/ST_Bioinformatics/ST_ANVIO_Work/TEST_20221212_PanGenomics >>> anvi-setup-ncbi-cogs --cog-data-dir /Users/siripongtongjai/ST_Bioinformatics/ST_ANVIO_Work/TEST_20221212_PanGenomics/cogs-data/ --num-threads 12 --just-do-it COG version ..................................: COG20 COG data source ..............................: The command line parameter. COG base directory ...........................: /Users/siripongtongjai/ST_Bioinformatics/ST_ANVIO_Work/TEST_20221212_PanGenomics/cogs-data
WARNING
This program will first check whether you have all the raw files, and then will attempt to regenerate everything that is necessary from them.
[12 Dec 22 22:51:12 Formatting protein ids to COG ids file] 80.24% ETA: 37s Killed :: anvi'o v7.1_main_0522 :: /Users/siripongtongjai/ST_Bioinformatics/ST_ANVIO_Work/TEST_20221212_PanGenomics >>>**
Hi @sttongjai,
This looks like a memory issue. It is possible that your docker containers are initiated with the default memory settings and you may need to increase max memory assigned to docker from the docker interface. Google should have good instructions for that :)
Hi @meren,
Thank you very much for your advice. After increase the memory to 20GB, things seem to be improving. However, I managed to have a config error- 'Error -3 while decompressing data: invalid stored block lengths'- after making PID-TO-CID.cPickle, CATEGORIES.txt and COG.txt. Still missing MISSING_COG_IDs.cPickle.
Config Error: Something went wrong while decompressing the downloaded file :/ It is likely
that the download failed and only part of the file was downloaded. If you would
like to try again, please run the setup command with the flag --reset
. Here is
what the downstream library said: 'Error -3 while decompressing data: invalid
stored block lengths'.
I am not sure what was the cause of this issue. Any suggestions?
Thank you very much for a speedy reply.
Siripong
I get the same error when trying to set up the COG database. Did someone manage to fix it?
I believe this was addressed with PRs #2110 and #2112. Anyone using anvi'o v8 or later has access to this fix. For those using an earlier version of anvi'o, the resolution to most issues with anvi-setup-ncbi-cogs
is to simply re-run it until it works, as described in #1738 .