BWA-MEME icon indicating copy to clipboard operation
BWA-MEME copied to clipboard

bwa-meme index

Open keiranmraine opened this issue 2 years ago • 11 comments

Some questions/comments on the index command.

  1. If you pre-generate BWT with -a mem2 will -a meme skip the BWT build?
    • BWT is single threaded from what I remember, we get charged for CPU so doing low resource bit separately is useful
  2. The help for bwa-meme index references bwa-mem2 and doesn't list meme as an option for -a.

(sorry, hopefully I'm more helpful than irritating)

keiranmraine avatar Apr 26 '22 15:04 keiranmraine

Your questions and comments are really helpful! We are excited to see people getting interested in our project.

Thank you for bringing this to our attention.

  1. Currently, BWT is not skipped in -a meme, and actually BWT building should be removed in BWA-MEME (bwa-meme don't use it).
  • BWT index was used during the development of BWA-MEME, which was deprecated.
  • There are much room for optimization in index building code, I will update soon.
  1. Thank you for pointing this out. The usage description will be also updated.

quito418 avatar Apr 26 '22 15:04 quito418

Could I get the list of files that bwa-meme mem -7 requires when executing? This is helpful for nextflow development.

keiranmraine avatar Apr 26 '22 15:04 keiranmraine

Including all indexes and trained models the required files are as below.

from v1.0.4 ( and master, dev branch)

ref.fa.amb
ref.fa.ann
ref.fa.pac
ref.fa.0123
ref.fa.pos_packed
ref.fa.suffixarray_uint64_L1_PARAMETERS
ref.fa.suffixarray_uint64_L2_PARAMETERS

ref.fa.suffixarray_uint64 is used for Learned-index training, not required at runtime

before v1.0.4

ref.fa.amb
ref.fa.ann
ref.fa.pac
ref.fa.0123
ref.fa.pos_packed
ref.fa.possa_packed
ref.fa.ref2sa_packed
ref.fa.suffixarray_uint64
ref.fa.suffixarray_uint64_L1_PARAMETERS
ref.fa.suffixarray_uint64_L2_PARAMETERS

quito418 avatar Apr 26 '22 15:04 quito418

Does it actually stat/open ref.fa? I seem to remember bwa doesn't actually use it after indexing, only extends the name to find the other files.

keiranmraine avatar Apr 26 '22 16:04 keiranmraine

Yes you are correct, ref.fa file should be omitted from the list.

The other files are necessary right now, we will remove requirement for ref.fa.suffixarray_uint64 file soon.

quito418 avatar Apr 26 '22 16:04 quito418

According to the CPU utilisation report from SLURM it doesn't appear providing 32 threads has any benefit when indexing.

Command (run under singularity):

bwa-meme index -a meme -t 32 ref.fasta

Report indicates ~0.99 of a CPU used (32 * 0.0309).

$ seff 59510806
Job ID: 59510806
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 32
CPU Utilized: 03:09:49
CPU Efficiency: 3.09% of 4-06:17:36 core-walltime
Job Wall-clock time: 03:11:48
Memory Utilized: 95.15 GB
Memory Efficiency: 50.92% of 186.88 GB

keiranmraine avatar Apr 27 '22 08:04 keiranmraine

Thats true, I just updated the code with multi-thread support for building MEME indexes.

  • The index build time is within 1 hour (depending on thread number) ~30minute for suffix array build, ~30 minute for building other indexes.

You can try below command ./bwa-meme index -a meme ~/human_ref/human_g1k_v37.fasta -t 32

I will also update the bioconda package soon.

quito418 avatar Apr 27 '22 08:04 quito418

Thats true, I just updated the code with multi-thread support for building MEME indexes.

* The index build time is within 1 hour (depending on thread number) ~30minute for suffix array build, ~30 minute for building other indexes.

You can try below command ./bwa-meme index -a meme ~/human_ref/human_g1k_v37.fasta -t 32

I will also update the bioconda package soon.

are these changes pushed to main branch ?

kkapuria3 avatar Apr 28 '22 16:04 kkapuria3

Thats true, I just updated the code with multi-thread support for building MEME indexes.

* The index build time is within 1 hour (depending on thread number) ~30minute for suffix array build, ~30 minute for building other indexes.

You can try below command ./bwa-meme index -a meme ~/human_ref/human_g1k_v37.fasta -t 32 I will also update the bioconda package soon.

are these changes pushed to main branch ?

Hi, it is updated in the master branch.

~~But is not updated in bioconda package (I made a PR, which is under review now)~~ Multi-thread index build is available since v1.0.3.

quito418 avatar Apr 29 '22 00:04 quito418

I noted in that the *..suffixarray_uint64_L0_PARAMETERS file is indicated as required for execution (with 1.0.4). I've found the process runs with only L1 and L2 available. Please advise if this is only used in certain circumstances.

keiranmraine avatar May 09 '22 13:05 keiranmraine

I noted in that the *..suffixarray_uint64_L0_PARAMETERS file is indicated as required for execution (with 1.0.4). I've found the process runs with only L1 and L2 available. Please advise if this is only used in certain circumstances.

You are correct, the *.L0_PARAMETERS is not used now (used for other types of learned-index models). I updated the list above.

Thanks for the suggestion!

quito418 avatar May 09 '22 13:05 quito418