DRAM icon indicating copy to clipboard operation
DRAM copied to clipboard

DRAM_setup not preparing all databases

Open etod5987 opened this issue 3 years ago • 13 comments

Hi there!

Running DRAM using linux bash, not in a conda environment. When I run DRAM_setup normally it only prepares the uniref database file, when I run it with "skip uniref" it only imports the Pfam-A.full.gz database. Each time it comes up with a bunch of errors.

I was trying to find the database files to download them manually but I can't locate them all. Happy to do a bunch of wgets and then set the database locations, but I need the URLs for the correct databases.

Any help with either getting DRAM_setup to work or finding the appropriate URLs would be much appreciated. Thank you.

etod5987 avatar Jun 21 '21 03:06 etod5987

Hi @shafferm , I also have a similar issue with our HPC server which somehow fails to download the database files directly when I use DRAM_setup. It would be really helpful if you could provide some detailed information on how to install them manually. Many thanks Venkat

srisvs33 avatar Jun 27 '21 17:06 srisvs33

Hi @etod5987 and @srisvs33,

What errors are you getting while downloading? If you share the traceback with the errors then I can try to find this issue.

If you want to download the databases yourself then you can download the files and pass them to DRAM-setup.py prepare_databases which will then process them. To set up all databases you can set all parameters which end with _loc for that command. You can find the paths for the databases by going to the websites where they are hosted or they are in the DRAM source code https://github.com/shafferm/DRAM/blob/master/mag_annotator/database_processing.py. All commands with the download_file method have the paths required. You can also had DRAM set the paths for the DRAM specific files using DRAM-setup.py update_dram_forms.

Mike

shafferm avatar Jul 02 '21 19:07 shafferm

Hi Mike,

I ran the setup command again, please find the error output file attached.

Thanks so much for your help,

Emma

On 3 Jul 2021, at 5:38 am, Michael Shaffer @.@.>> wrote:

Hi @etod5987https://github.com/etod5987 and @srisvs33https://github.com/srisvs33,

What errors are you getting while downloading? If you share the traceback with the errors then I can try to find this issue. If you want to download the databases yourself then you can download the files and pass them to DRAM-setup.py prepare_databases which will then process them. To set up all databases you can set all parameters which end with _loc for that command. You can find the paths for the databases by going to the websites where they are hosted or they are in the DRAM source code https://github.com/shafferm/DRAM/blob/master/mag_annotator/database_processing.py. All commands with the download_file method have the paths required. You can also had DRAM set the paths for the DRAM specific files using DRAM-setup.py update_dram_forms.

Mike

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-873219969, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPXOSLOUPEBO5X4OIPLTVYITJANCNFSM47AYMXRQ.

etod5987 avatar Jul 06 '21 00:07 etod5987

Hi Emma,

I can't see an attachment. I think you need to post it through github for me to see it.

Mike

shafferm avatar Jul 09 '21 22:07 shafferm

DRAM_setup_error.txt

Hi Mike, sorry, didn't realise it wouldn't upload automatically!

etod5987 avatar Jul 10 '21 00:07 etod5987

It looks like DRAM can't find mmseqs2. Did you install mmseqs2 prior to running DRAM?

shafferm avatar Jul 16 '21 22:07 shafferm

I believe so but I will double check.

On 17 Jul 2021, at 8:56 am, Michael Shaffer @.@.>> wrote:

It looks like DRAM can't find mmseqs2. Did you install mmseqs2 prior to running DRAM?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-881760139, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPXG2MGZZQSPZ7V2KLDTYC2ITANCNFSM47AYMXRQ.

etod5987 avatar Jul 17 '21 01:07 etod5987

So turns out I didn’t have mmseqs2, but now I do I ran it again and it created more files this time, but still came up with an error:

Traceback (most recent call last): File "/usr/local/python/3.6.5/bin/DRAM-setup.py", line 146, in args.func(**args_dict) File "/usr/local/python/3.6.5/lib/python3.6/site-packages/mag_annotator/database_processing.py", line 458, in prepare_databases threads=threads, verbose=verbose) File "/usr/local/python/3.6.5/lib/python3.6/site-packages/mag_annotator/database_processing.py", line 112, in download_and_process_uniref make_mmseqs_db(uniref_fasta_zipped, uniref_mmseqs_db, create_index=True, threads=threads, verbose=verbose) File "/usr/local/python/3.6.5/lib/python3.6/site-packages/mag_annotator/utils.py", line 49, in make_mmseqs_db run_process(['mmseqs', 'createindex', output_loc, tmp_dir, '--threads', str(threads)], verbose=verbose) File "/usr/local/python/3.6.5/lib/python3.6/site-packages/mag_annotator/utils.py", line 39, in run_process stderr=stderr).stdout.decode(errors='ignore') File "/usr/local/python/3.6.5/lib/python3.6/subprocess.py", line 418, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['mmseqs', 'createindex', 'DRAM_data/database_files/uniref90.20210720.mmsdb', 'DRAM_data/database_files/tmp', '--threads', '10']' returned non-zero exit status 1.

Any ideas?

On 17 Jul 2021, at 8:56 am, Michael Shaffer @.@.>> wrote:

It looks like DRAM can't find mmseqs2. Did you install mmseqs2 prior to running DRAM?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-881760139, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPXG2MGZZQSPZ7V2KLDTYC2ITANCNFSM47AYMXRQ.

etod5987 avatar Jul 20 '21 01:07 etod5987

Unfortunately, dram is not providing much info here, you can try rerunning DRAM_setup with the addition of the --verbose flag. You can also run the command mmseqs createindex DRAM_data/database_files/uniref90.20210720.mmsdb DRAM_data/database_files/tmp --threads 10 and post the output here.

Also, if you are running the setup in the same directory as the failed past attempt, dram may be running into pre-existing files. You may want to clean up or try a fresh directory.

rmFlynn avatar Jul 20 '21 23:07 rmFlynn

Hi Rory,

Thanks for the suggestion! Here is the output of the mmseqs create index command. I will also now run DRAM_setup with verbose attached.

MMseqs Version: GITDIR-NOTFOUND Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out k-mer length 0 Alphabet size nucl:5,aa:21 Compositional bias 1 Max sequence length 65535 Max results per query 300 Mask residues 1 Mask lower case residues 0 Spaced k-mers 1 Spaced k-mer pattern Sensitivity 7.5 k-score 0 Check compatible 0 Search type 0 Split database 0 Split memory limit 0 Verbosity 3 Threads 10 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Compressed 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Strand selection 1 Remove temporary files false

createindex DRAM_data/database_files/uniref90.20210720.mmsdb DRAM_data/database_files/tmp --threads 10

MMseqs Version: GITDIR-NOTFOUND Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out k-mer length 0 Alphabet size nucl:5,aa:21 Compositional bias 1 Max sequence length 65535 Max results per query 300 Mask residues 1 Mask lower case residues 0 Spaced k-mers 1 Spaced k-mer pattern Sensitivity 7.5 k-score 0 Check compatible 0 Search type 0 Split database 0 Split memory limit 0 Verbosity 3 Threads 10 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Compressed 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Strand selection 1 Remove temporary files false

indexdb DRAM_data/database_files/uniref90.20210720.mmsdb DRAM_data/database_files/uniref90.20210720.mmsdb --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 0 --alph-size nucl:5,aa:21 --comp-bias-corr 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score 0 --check-compatible 0 --search-type 0 --split 0 --split-memory-limit 0 -v 3 --threads 10

Failed to mmap memory dataSize=45631362656 File=DRAM_data/database_files/uniref90.20210720.mmsdb. Error 12. Error: indexdb died

According to the mmseqs2 forum this error can happen when not enough memory is provided. Going to run the DRAM command again with a much larger RAM limit on the qsub job.

On 21 Jul 2021, at 9:46 am, Rory M Flynn @.@.>> wrote:

Unfortunately, dram is not providing much info here, you can try rerunning DRAM_setup with the addition of the --verbose flag. You can also run the command mmseqs createindex DRAM_data/database_files/uniref90.20210720.mmsdb DRAM_data/database_files/tmp --threads 10 and post the output here.

Also, if you are running the setup in the same directory as the failed past attempt, dram may be running into pre-existing files. You may want to clean up or try a fresh directory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-883776626, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPT7QZ2S6Z6GYVD47SDTYYDELANCNFSM47AYMXRQ.

etod5987 avatar Jul 21 '21 01:07 etod5987

Sorry for the slow follow up @etod5987 but this looks like you don't have enough RAM to set up uniref90 for DRAM. You can try to set up on a machine with more memory, more requested memory if you are using a job submission system on a cluster or you can try using the --skip_uniref flag and set up the rest of the databases. This will not affect DRAM distillation.

shafferm avatar Sep 01 '21 23:09 shafferm

Hi Michael,

I realised the lack of memory was the problem, changed the allocation, and all the databases downloaded! However, the location of the databases hasn’t saved into the config file as when I ran the command on some test files it says database not found.

I’m guessing I need to do the set database location command to fix this?

Emma

On 2 Sep 2021, at 9:03 am, Michael Shaffer @.@.>> wrote:

Sorry for the slow follow up @etod5987https://github.com/etod5987 but this looks like you don't have enough RAM to set up uniref90 for DRAM. You can try to set up on a machine with more memory, more requested memory if you are using a job submission system on a cluster or you can try using the --skip_uniref flag and set up the rest of the databases. This will not affect DRAM distillation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/shafferm/DRAM/issues/94#issuecomment-910869439, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATYEJPSCAZPJWWYAWW5MCVDT72WLFANCNFSM47AYMXRQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

etod5987 avatar Sep 02 '21 00:09 etod5987

Hmm. If prepare_databases finished without an error then the config should be filled out. If you run DRAM-setup.py print_config all the databases are none? What files are in the folder where you stored the DRAM databases?

shafferm avatar Sep 02 '21 23:09 shafferm