Whisper icon indicating copy to clipboard operation
Whisper copied to clipboard

Segmentfault when aligning 25X e.coli single-end reads

Open i-xiaohu opened this issue 4 years ago • 8 comments

Hi, whisper developers. I run the command whisper ref/ref data1.fastq, and whisper (released version 2.0.1) results in

***** Preprocessing of reads *****
100.0%
Completing the preprocessing (could take a minute or so)
Preprocessing time: 2.39412s
** Loading reference and index **
***** Reads mapping *****
** End of mapping **
Main processing time: 43.4175s
***** Postprocessing *****
** Loading reference **
Segmentation fault (core dumped)

The ref is a common e.coli reference sequence, and the data1.fq is 593M, see the first reads down below.

@SRR1562082.1 HWI-ST1336:80:C3CJUACXX:1:1101:2018:2193/1
ATCGCATCCGGGCAGTAGTATTTTGCTTTTTTCAGAAAATAATCAAAAAAAGTTAGCGTGGTGAATCGATACTTTACCGGTTGAATTTGCATCAATTTCAT
+
@B@FFFFDFHHHHJJGFHHFHGGJHIJIJJJJIJJJJJGIIIJJJJJJJJJFEEHHFFFDDAB@CC@BBBABCDECDCBBBBBDCADDDDEEDDDDECCEE

Whisper finally gives an empty SAM file.

Thanks! i-xiaohu

i-xiaohu avatar Oct 29 '20 12:10 i-xiaohu

Hello,

I'll take a look on that ASAP.

Regards, Adam

agudys avatar Nov 05 '20 12:11 agudys

Hello,

I am also experiencing the same issue.

I followed the guide in the Quick start and met Segmentation Fault.

my commands:

src/whisper-index human ~/human_ref/human_g1k_v37.fasta ./index ./temp/
src/whisper -r -out mappings ./index/human ~/ERR3239276.fq

Error log:

***** Preprocessing of reads *****
100.0%
Completing the preprocessing (could take a minute or so)
Preprocessing time: 2.44478s
** Loading reference and index **
***** Reads mapping *****
** End of mapping **
Main processing time: 201.566s
***** Postprocessing *****
** Loading reference **
Segmentation fault (core dumped)

and when I used GDB I get the below result.

(gdb) bt
#0  0x0000000000486f85 in CSamGenerator::store_mapped_read(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsig
ned int, unsigned int, unsigned int, unsigned int, unsigned char*&) ()
#1  0x0000000000489190 in CSamGenerator::process_group_se() ()
#2  0x000000000048f828 in CSamGenerator::operator()() ()
#3  0x000000000054da14 in execute_native_thread_routine ()
#4  0x000000000041fb19 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x0000000000615ab3 in clone ()

Thank you!

quito418 avatar Nov 27 '21 08:11 quito418

Hello,

Sorry it took me so long. I was able to reproduce the error. I'll let you know once it's fixed (this time, I promise to do this sooner ;)).

Adam

agudys avatar Dec 07 '21 21:12 agudys

@quito418 @i-xiaohu I have just commited a fix for the bug you reported. Please let me know if now the single-end mode works properly.

Btw, you don't need to specify -r option at all for the single-end mapping.

agudys avatar Dec 19 '21 22:12 agudys

@quito418 @i-xiaohu I have just commited a fix for the bug you reported. Please let me know if now the single-end mode works properly.

Btw, you don't need to specify -r option at all for the single-end mapping.

Thank you for your time.

I will let you know if I have a problem.

Best Regards,

quito418 avatar Dec 20 '21 01:12 quito418

@agudys Hi,

Thank you, I checked it runs well without segfault after the fix.

I just want to make sure everything is working fine.

In particular, I am currently running Whisper with 48 threads for the human genome using 800M 101bp short reads.

./src/whisper -rs -out mappings -t 48 -temp ./temp/ ./index/human /ssd/ERR194147_1.fastq.gz

The post-processing stage takes really long (currently running for like 2 hours) compared to the preceding 2 steps (Preprocessing 735 sec, Read mapping 844 sec).

So I wonder if it is supposed to be like that or if there is a recommendation for the number of threads.

image

  • htop command shows the cores are not fully utilized when using 48 threads
  • I checked that I/O is not a bottleneck by using iotop command
  • I am currently running in the machine with 256GB RAM, Whisper uses ~35GB of memory

I would appreciate any advice.

Best Regards,

quito418 avatar Dec 20 '21 09:12 quito418

@quito418 I must admit that postprocessing time look strange. In our experiments on 32 cores, approximately 3 hours were needed to perform full paired-end mappings of ~100GB human reads in gz. Maybe there is still something wrong with the single-end mode... Is 48 the physical or logical number of cores at your machine? In the latter case, you could try to reduce number of threads to 24.

Adam

agudys avatar Dec 20 '21 11:12 agudys

@agudys Thanks for the information.

I was using 24 physical cores and 48 logical cores for the experiment.

I will reduce the number of threads for my experiment and update the result here!

Best Regards,

quito418 avatar Dec 20 '21 13:12 quito418