bwa
bwa copied to clipboard
Fail to index human genome in a shell with 120Gb memory
I've recently suffered a lot from indexing a customized human genome (mask certain regions). I found several issues shown as below:
1st:
But I use ls -lh and confirmed the existence of the .bwt file.
2nd:
I don't know what's wrong with it. I didn't find an explanation online for this issue.
3rd:
Still don't know what's wrong with it. Didnt find an explanation for this issue.
For the first issue here, I googled and some said it is because of lack of memory, that's not likely to be the reason since I already have 120 GB allocated to this shell(by PBS pro) and only one bwa index job is running.
Furthermore, the /usr/bin/time gives memory profiling, and the peak RAM usage seems to be around 4596492 kb(4.4Gb) only.
6292.08user 57.20system 1:47:00elapsed 98%CPU (0avgtext+0avgdata 4596492maxresident)k
0inputs+13786480outputs (0major+83721376minor)pagefaults 0swaps
Therefore, what could possibly go wrong with it? BTW, I indexed successfully once for the same fasta file when commanding bwa index in the front end. But I need to implement this step into my pipeline and it should work as well in the back end.
Pls share some thoughts with this issue. Much appreciated.
Most likely you are reading files from some shared storage (NFS/SAMBA/Windows share)
Have you resolved your issue?
Most likely you are reading files from some shared storage (NFS/SAMBA/Windows share)
Have you resolved your issue?
Yeah I think the main issue is about my fasta file. The file is mostly hard masked with N, leaving a small proportion recording actual DNA sequence. When I removed all the contigs with all Ns in it, the indexing process became normal.