HiCExplorer
HiCExplorer copied to clipboard
Sorting error with bwa Hi-C mode aligned files
Hi all,
i have a bit of a problem when using a modified approach of aligning my Hi-C data with the bwa-mem algorithm.
I'm using HiCExplorer version 3.7.2 and python 3.8.0
When aligning the paired sequences separate with the settings as proposed in your documentation everything runs fine.
However we want to align our sequences using the "Hi-C" option -5 for our data.
bwa mem -t 32 -5SP /../hg38.p13.fa.gz /../L79851_Track-123454_R1.fastq.gz /../L79851_Track-123454_R2.fastq.gz 2> L79851_Track-123454.bwa.log | samtools view -S -h -b -F2316 > L79851_Track-123454.aligned.bam
After the alignment i split the bam file with the following code
samtools view -h -f 0x40 /../../L79851_Track-123454.aligned.bam -o L79851_Track-123454.aligned.unsorted.R1.bam
samtools view -h -f 0x80 /../../L79851_Track-123454.aligned.bam -o L79851_Track-123454.aligned.unsorted.R2.bam
these files are then sorted with the -n flag of samtools sort.
When trying to run hiCbuildmatrix with the same command that already worked
hicBuildMatrix --samFiles L79851_Track-123454.aligned.R1.sort.bam L79851_Track-123454.aligned.R2.sort.bam --outFileName L79851_Track-123454.sort.contactMatrix.h5 --QCfolder QC.L79851_Track-123454 --restrictionCutFile ../../../../hicFindRestSite.build38.bed --restrictionSequence GATC GANTC CTNAG TTAA --danglingSequence GATC ANT TNA TA
I'm getting an assertion error
`[E::idx_find_and_load] Could not retrieve index file for 'L79851_Track-123454.aligned.R1.sort.bam' [E::idx_find_and_load] Could not retrieve index file for 'L79851_Track-123454.aligned.R2.sort.bam' INFO:hicexplorer.hicBuildMatrix:dangling sequences to check are {'GATC': {'pat_forw': 'GATC', 'pat_rev': 'GATC'}, 'GANTC': {'pat_forw': 'ANT', 'pat_rev': 'ANT'}, 'CTNAG': {'pat_forw': 'TNA', 'pat_rev': 'TNA'}, 'TTAA': {'pat_forw': 'TA', 'pat_rev': 'TA'}}
Traceback (most recent call last):
File "/opt/conda/bin/hicBuildMatrix", line 7, in
AssertionError: FATAL ERROR A01182:63:HKW5YDSX2:2:1101:1000:2127 A01182:63:HKW5YDSX2:2:1101:1000:2096 Be sure that the sam files have the same read order If using Bowtie2 or Hisat2 add the --reorder option`
When checking the sequence order of the files with the bash comm command there are no differing lines. The sequences in the Error message are at the beginning of each of the BAM files and are at the same position.
Head output of R1 and R2 headR1sort.txt headR2sort.txt are attached.
Issues regarding this error were already posted and closed (#349 , #387 , #562 , #703 and #673) . For the last two ones there is no solution posted as there was personal communication.
I'm looking forward to your comments.
All the best Norbert
PS: In your documentation https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindRestSite.html you could update that multiple restriction enzymes now work, i found it here in an issue on GitHub that the function is already implemented.
Dear Norbert,
Thanks a lot for the detailed report and feedback. It looks to me I need to dig a bit deeper to come to a satisfying solution and or explanation.
I will try to comeback to you as fast as possible.
Best,
Joachim
Hi @joachimwolff
the problem with the sorting error is now solved. In parallel to the Juicer pipeline i used the above mentioned code but added the -M flag (which is marking shorter splits as secondary according to the bwa manual - used for Picard) now running the alignment with the following code
bwa mem -t 32 -5SPM /../hg38.p13.fa.gz /../L79851_Track-123454_R1.fastq.gz /../L79851_Track-123454_R2.fastq.gz > L79851_Track-123454.aligned.bam
Following this the aligned file is sorted like in Juicer with
samtools sort -t cb -n L79851_Track-123454.aligned.bam > L79851_Track-123454.aligned.sorted.bam
and then split into two files again with the code used in the initial post
samtools view -h -f 0x40 /../../L79851_Track-123454.aligned.bam -o L79851_Track-123454.aligned.sorted.R1.sam samtools view -h -f 0x80 /../../L79851_Track-123454.aligned.bam -o L79851_Track-123454.aligned.sorted.R2.sam
Using this approach with the "Hi-C mode" of bwa then works - HiCBuildMatrix runs and starts fine without any error.
(Maybe something for another issue: The results of the QC differ greatly between Juicer and HiCExplorer although using the same alignment file, any explanation on that ?)
All the best and greets Norbert