bwa
bwa copied to clipboard
bwa index problem when indexing files with large size
I have downloaded the nt data base from NCBI , for which the size is about 90G, I want to create a index file with commands as follows: bwa index -a bwtsw nt.fasta
Although it is running without any problem but it seems that the iterations can not come to an end after several days running.
I think it may be the problem of the big size for nt.fasta file. So is there any other methods for me to index big files like nt.fasta?
Thanks!
Hello,sorry to bother you. I have the same problem, have you solved it?
Sorry, i haven't solved it yet and i just chose to split the large file into several small files before indexing.
I‘m so glad to get your reply, I would like to ask is it reasonable to split the reference like this? Will it affect the result of mapping?
Yes, it's definitely reasonable because splitting large files is not only what i did to solve this problem but NCBI did the same thing with their nt database. (For details, see https://ftp.ncbi.nlm.nih.gov/blast/db/)
ok, thank you,you are so nice.
You may try a larger -b
value
bwa index -b 100000000
It will be a little faster, but indexing nt will take days anyway.
Okay, thanks! I'll try later.
I also encountered the same problem. As you suggested, I increased the original -b
parameter by a factor of 10. Unfortunately, fna.pac still doesn't grow after reaching the same size as before. The log file does not record the occurrence of the error. I think this may be caused by the memory constraints of my compute nodes, and simply increasing the -b
parameter may not help much.