yahs icon indicating copy to clipboard operation
yahs copied to clipboard

Unexpected Zero Reads Pairs and Output Size with yahs and Juicer Preprocessing Pipeline

Open Hanjiangna opened this issue 10 months ago • 4 comments

Hello Developers,

I have been utilizing yahs for Hi-C data processing followed by Juicer's preprocessing pipeline. My command for running yahs was:

yahs /NGS/Fungi/Rc/juicer/references/R0301HifiHicOnt.asm.hic.hap2.p_ctg.fa R0301onthap2.sort.bam -e GATC

During the execution of yahs, it recognized and logged the following information:

[I::find_re_from_seqs] number restriction enzyme cutting sites found in sequences: 336082
[I::find_re_from_seqs] restriction enzyme cutting sites density: 0.008152
[I::main] dump hic links (BAM) to binary file yahs.out.bin
[I::dump_links_from_bam_file] 1 million records processed, 0 read pairs
[I::dump_links_from_bam_file] 2 million records processed, 0 read pairs

However, even after processing millions of records from the BAM file, there were no read pairs detected. This was confirmed by the message:

0 read pairs processed

Subsequently, I proceeded with the Juicer preprocessing step using the following command:

juicer pre -a -o out_JBAT yahs.out.bin yahs.out_scaffolds_final.agp /NGS/Fungi/Rc/juicer/references/R0301HifiHicOnt.asm.hic.hap2.p_ctg.fa.fai

Upon completion, the output out_JBAT.txt file contained no data, i.e., its size was effectively 0 bytes.

My question is, given the absence of read pairs in the yahs output, is it normal for the Juicer preprocessing step to generate an empty out_JBAT.txt file? Could the lack of detected read pairs indicate an issue with either the alignment in the BAM file (R0301onthap2.sort.bam) or how yahs is handling the data?

It seems unusual that no valid interactions would be identified, especially considering the large number of records processed. I would appreciate any insights into what might cause such an outcome and suggestions on how to troubleshoot this issue.

Thank you for your attention and assistance.

Best regards, Han jiangna

Hanjiangna avatar Apr 16 '24 04:04 Hanjiangna

Hello @Hanjiangna,

Sorry for the delayed reply. This is usually caused by a malformatted BAM file. How did you generate your BAM file? If you can show me the header lines of your BAM file and a few lines of records, I can probably tell the reason.

Best, Chenxi

c-zhou avatar Jun 04 '24 18:06 c-zhou

Hello Developer Sorry for the late response, as I was occupied with various exams. Below is a screenshot of the header section and a few lines of the record entries. image image Best regards, Han jiangna

Hanjiangna avatar Jun 27 '24 13:06 Hanjiangna

Hi Jiangna,

I can see two problems regarding the BAM file you showed.

  1. The SAM flags say the three read pairs were all properly mapped (the 2nd column), but none of them are really paired. They all have different read names (the 1st column). Two paired reads should be grouped together if sorted by read names.
  2. For all three read pairs, the two reads were mapped to the exact same position (as indicated by the 4th, 7th, 8th and 9th columns), which does not look right.

There is probably something wrong with your read mapping.

Best, Chenxi

c-zhou avatar Jun 28 '24 20:06 c-zhou

Hello Developer Thanks your reply.I will check the step of read mapping. Best wishes! Han jiangna

Hanjiangna avatar Jun 29 '24 14:06 Hanjiangna