DRAM
DRAM copied to clipboard
DRAM_setup not preparing all databases
Hi there!
Running DRAM using linux bash, not in a conda environment. When I run DRAM_setup normally it only prepares the uniref database file, when I run it with "skip uniref" it only imports the Pfam-A.full.gz database. Each time it comes up with a bunch of errors.
I was trying to find the database files to download them manually but I can't locate them all. Happy to do a bunch of wgets and then set the database locations, but I need the URLs for the correct databases.
Any help with either getting DRAM_setup to work or finding the appropriate URLs would be much appreciated. Thank you.
Hi @shafferm , I also have a similar issue with our HPC server which somehow fails to download the database files directly when I use DRAM_setup. It would be really helpful if you could provide some detailed information on how to install them manually. Many thanks Venkat
Hi @etod5987 and @srisvs33,
What errors are you getting while downloading? If you share the traceback with the errors then I can try to find this issue.
If you want to download the databases yourself then you can download the files and pass them to DRAM-setup.py prepare_databases
which will then process them. To set up all databases you can set all parameters which end with _loc
for that command. You can find the paths for the databases by going to the websites where they are hosted or they are in the DRAM source code https://github.com/shafferm/DRAM/blob/master/mag_annotator/database_processing.py. All commands with the download_file
method have the paths required. You can also had DRAM set the paths for the DRAM specific files using DRAM-setup.py update_dram_forms
.
Mike
Hi Mike,
I ran the setup command again, please find the error output file attached.
Thanks so much for your help,
Emma
On 3 Jul 2021, at 5:38 am, Michael Shaffer @.@.>> wrote:
Hi @etod5987https://github.com/etod5987 and @srisvs33https://github.com/srisvs33,
What errors are you getting while downloading? If you share the traceback with the errors then I can try to find this issue. If you want to download the databases yourself then you can download the files and pass them to DRAM-setup.py prepare_databases which will then process them. To set up all databases you can set all parameters which end with _loc for that command. You can find the paths for the databases by going to the websites where they are hosted or they are in the DRAM source code https://github.com/shafferm/DRAM/blob/master/mag_annotator/database_processing.py. All commands with the download_file method have the paths required. You can also had DRAM set the paths for the DRAM specific files using DRAM-setup.py update_dram_forms.
Mike
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-873219969, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPXOSLOUPEBO5X4OIPLTVYITJANCNFSM47AYMXRQ.
Hi Emma,
I can't see an attachment. I think you need to post it through github for me to see it.
Mike
It looks like DRAM can't find mmseqs2. Did you install mmseqs2 prior to running DRAM?
I believe so but I will double check.
On 17 Jul 2021, at 8:56 am, Michael Shaffer @.@.>> wrote:
It looks like DRAM can't find mmseqs2. Did you install mmseqs2 prior to running DRAM?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-881760139, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPXG2MGZZQSPZ7V2KLDTYC2ITANCNFSM47AYMXRQ.
So turns out I didn’t have mmseqs2, but now I do I ran it again and it created more files this time, but still came up with an error:
Traceback (most recent call last):
File "/usr/local/python/3.6.5/bin/DRAM-setup.py", line 146, in
Any ideas?
On 17 Jul 2021, at 8:56 am, Michael Shaffer @.@.>> wrote:
It looks like DRAM can't find mmseqs2. Did you install mmseqs2 prior to running DRAM?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-881760139, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPXG2MGZZQSPZ7V2KLDTYC2ITANCNFSM47AYMXRQ.
Unfortunately, dram is not providing much info here, you can try rerunning DRAM_setup with the addition of the --verbose flag. You can also run the command mmseqs createindex DRAM_data/database_files/uniref90.20210720.mmsdb DRAM_data/database_files/tmp --threads 10
and post the output here.
Also, if you are running the setup in the same directory as the failed past attempt, dram may be running into pre-existing files. You may want to clean up or try a fresh directory.
Hi Rory,
Thanks for the suggestion! Here is the output of the mmseqs create index command. I will also now run DRAM_setup with verbose attached.
MMseqs Version: GITDIR-NOTFOUND Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out k-mer length 0 Alphabet size nucl:5,aa:21 Compositional bias 1 Max sequence length 65535 Max results per query 300 Mask residues 1 Mask lower case residues 0 Spaced k-mers 1 Spaced k-mer pattern Sensitivity 7.5 k-score 0 Check compatible 0 Search type 0 Split database 0 Split memory limit 0 Verbosity 3 Threads 10 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Compressed 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Strand selection 1 Remove temporary files false
createindex DRAM_data/database_files/uniref90.20210720.mmsdb DRAM_data/database_files/tmp --threads 10
MMseqs Version: GITDIR-NOTFOUND Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out k-mer length 0 Alphabet size nucl:5,aa:21 Compositional bias 1 Max sequence length 65535 Max results per query 300 Mask residues 1 Mask lower case residues 0 Spaced k-mers 1 Spaced k-mer pattern Sensitivity 7.5 k-score 0 Check compatible 0 Search type 0 Split database 0 Split memory limit 0 Verbosity 3 Threads 10 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Compressed 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Strand selection 1 Remove temporary files false
indexdb DRAM_data/database_files/uniref90.20210720.mmsdb DRAM_data/database_files/uniref90.20210720.mmsdb --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 0 --alph-size nucl:5,aa:21 --comp-bias-corr 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score 0 --check-compatible 0 --search-type 0 --split 0 --split-memory-limit 0 -v 3 --threads 10
Failed to mmap memory dataSize=45631362656 File=DRAM_data/database_files/uniref90.20210720.mmsdb. Error 12. Error: indexdb died
According to the mmseqs2 forum this error can happen when not enough memory is provided. Going to run the DRAM command again with a much larger RAM limit on the qsub job.
On 21 Jul 2021, at 9:46 am, Rory M Flynn @.@.>> wrote:
Unfortunately, dram is not providing much info here, you can try rerunning DRAM_setup with the addition of the --verbose flag. You can also run the command mmseqs createindex DRAM_data/database_files/uniref90.20210720.mmsdb DRAM_data/database_files/tmp --threads 10 and post the output here.
Also, if you are running the setup in the same directory as the failed past attempt, dram may be running into pre-existing files. You may want to clean up or try a fresh directory.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-883776626, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPT7QZ2S6Z6GYVD47SDTYYDELANCNFSM47AYMXRQ.
Sorry for the slow follow up @etod5987 but this looks like you don't have enough RAM to set up uniref90 for DRAM. You can try to set up on a machine with more memory, more requested memory if you are using a job submission system on a cluster or you can try using the --skip_uniref
flag and set up the rest of the databases. This will not affect DRAM distillation.
Hi Michael,
I realised the lack of memory was the problem, changed the allocation, and all the databases downloaded! However, the location of the databases hasn’t saved into the config file as when I ran the command on some test files it says database not found.
I’m guessing I need to do the set database location command to fix this?
Emma
On 2 Sep 2021, at 9:03 am, Michael Shaffer @.@.>> wrote:
Sorry for the slow follow up @etod5987https://github.com/etod5987 but this looks like you don't have enough RAM to set up uniref90 for DRAM. You can try to set up on a machine with more memory, more requested memory if you are using a job submission system on a cluster or you can try using the --skip_uniref flag and set up the rest of the databases. This will not affect DRAM distillation.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-910869439, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPSCAZPJWWYAWW5MCVDT72WLFANCNFSM47AYMXRQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hmm. If prepare_databases
finished without an error then the config should be filled out. If you run DRAM-setup.py print_config
all the databases are none? What files are in the folder where you stored the DRAM databases?