bwa icon indicating copy to clipboard operation
bwa copied to clipboard

Fail to index human genome in a shell with 120Gb memory

Open yangyxt opened this issue 3 years ago • 2 comments

I've recently suffered a lot from indexing a customized human genome (mask certain regions). I found several issues shown as below: 1st: image But I use ls -lh and confirmed the existence of the .bwt file.

2nd: image I don't know what's wrong with it. I didn't find an explanation online for this issue.

3rd: image Still don't know what's wrong with it. Didnt find an explanation for this issue.

For the first issue here, I googled and some said it is because of lack of memory, that's not likely to be the reason since I already have 120 GB allocated to this shell(by PBS pro) and only one bwa index job is running.

Furthermore, the /usr/bin/time gives memory profiling, and the peak RAM usage seems to be around 4596492 kb(4.4Gb) only.

6292.08user 57.20system 1:47:00elapsed 98%CPU (0avgtext+0avgdata 4596492maxresident)k
0inputs+13786480outputs (0major+83721376minor)pagefaults 0swaps

Therefore, what could possibly go wrong with it? BTW, I indexed successfully once for the same fasta file when commanding bwa index in the front end. But I need to implement this step into my pipeline and it should work as well in the back end.

Pls share some thoughts with this issue. Much appreciated.

yangyxt avatar Mar 05 '21 00:03 yangyxt

Most likely you are reading files from some shared storage (NFS/SAMBA/Windows share)

Have you resolved your issue?

markotitel avatar Aug 05 '21 21:08 markotitel

Most likely you are reading files from some shared storage (NFS/SAMBA/Windows share)

Have you resolved your issue?

Yeah I think the main issue is about my fasta file. The file is mostly hard masked with N, leaving a small proportion recording actual DNA sequence. When I removed all the contigs with all Ns in it, the indexing process became normal.

yangyxt avatar Aug 06 '21 03:08 yangyxt