BWA-MEME
BWA-MEME copied to clipboard
bwa-meme index
Some questions/comments on the index command.
- If you pre-generate BWT with
-a mem2
will-a meme
skip the BWT build?- BWT is single threaded from what I remember, we get charged for CPU so doing low resource bit separately is useful
- The help for
bwa-meme index
referencesbwa-mem2
and doesn't listmeme
as an option for-a
.
(sorry, hopefully I'm more helpful than irritating)
Your questions and comments are really helpful! We are excited to see people getting interested in our project.
Thank you for bringing this to our attention.
- Currently, BWT is not skipped in
-a meme
, and actually BWT building should be removed in BWA-MEME (bwa-meme don't use it).
- BWT index was used during the development of BWA-MEME, which was deprecated.
- There are much room for optimization in index building code, I will update soon.
- Thank you for pointing this out. The usage description will be also updated.
Could I get the list of files that bwa-meme mem -7
requires when executing? This is helpful for nextflow development.
Including all indexes and trained models the required files are as below.
from v1.0.4 ( and master, dev branch)
ref.fa.amb
ref.fa.ann
ref.fa.pac
ref.fa.0123
ref.fa.pos_packed
ref.fa.suffixarray_uint64_L1_PARAMETERS
ref.fa.suffixarray_uint64_L2_PARAMETERS
ref.fa.suffixarray_uint64 is used for Learned-index training, not required at runtime
before v1.0.4
ref.fa.amb
ref.fa.ann
ref.fa.pac
ref.fa.0123
ref.fa.pos_packed
ref.fa.possa_packed
ref.fa.ref2sa_packed
ref.fa.suffixarray_uint64
ref.fa.suffixarray_uint64_L1_PARAMETERS
ref.fa.suffixarray_uint64_L2_PARAMETERS
Does it actually stat/open ref.fa
? I seem to remember bwa doesn't actually use it after indexing, only extends the name to find the other files.
Yes you are correct, ref.fa file should be omitted from the list.
The other files are necessary right now, we will remove requirement for ref.fa.suffixarray_uint64
file soon.
According to the CPU utilisation report from SLURM it doesn't appear providing 32 threads has any benefit when indexing.
Command (run under singularity):
bwa-meme index -a meme -t 32 ref.fasta
Report indicates ~0.99 of a CPU used (32 * 0.0309).
$ seff 59510806
Job ID: 59510806
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 32
CPU Utilized: 03:09:49
CPU Efficiency: 3.09% of 4-06:17:36 core-walltime
Job Wall-clock time: 03:11:48
Memory Utilized: 95.15 GB
Memory Efficiency: 50.92% of 186.88 GB
Thats true, I just updated the code with multi-thread support for building MEME indexes.
- The index build time is within 1 hour (depending on thread number) ~30minute for suffix array build, ~30 minute for building other indexes.
You can try below command
./bwa-meme index -a meme ~/human_ref/human_g1k_v37.fasta -t 32
I will also update the bioconda package soon.
Thats true, I just updated the code with multi-thread support for building MEME indexes.
* The index build time is within 1 hour (depending on thread number) ~30minute for suffix array build, ~30 minute for building other indexes.
You can try below command
./bwa-meme index -a meme ~/human_ref/human_g1k_v37.fasta -t 32
I will also update the bioconda package soon.
are these changes pushed to main branch ?
Thats true, I just updated the code with multi-thread support for building MEME indexes.
* The index build time is within 1 hour (depending on thread number) ~30minute for suffix array build, ~30 minute for building other indexes.
You can try below command
./bwa-meme index -a meme ~/human_ref/human_g1k_v37.fasta -t 32
I will also update the bioconda package soon.are these changes pushed to main branch ?
Hi, it is updated in the master branch.
~~But is not updated in bioconda package (I made a PR, which is under review now)~~ Multi-thread index build is available since v1.0.3.
I noted in that the *..suffixarray_uint64_L0_PARAMETERS
file is indicated as required for execution (with 1.0.4). I've found the process runs with only L1 and L2 available. Please advise if this is only used in certain circumstances.
I noted in that the
*..suffixarray_uint64_L0_PARAMETERS
file is indicated as required for execution (with 1.0.4). I've found the process runs with only L1 and L2 available. Please advise if this is only used in certain circumstances.
You are correct, the *.L0_PARAMETERS is not used now (used for other types of learned-index models). I updated the list above.
Thanks for the suggestion!