biscuit icon indicating copy to clipboard operation
biscuit copied to clipboard

one read has different CIGAR and read length

Open alexyfyf opened this issue 3 years ago • 0 comments

Hi team, I'm following your document from https://huishenlab.github.io/biscuit/ to analyse RRBS data.

The command I'm using is

biscuit align -t 12 -M -R "@RG\tID:1\tSM:'$BASE'" $REF $FILE | \
        samblaster -M | \
        samtools sort -o ${BASE}_mdups_sorted.bam -O BAM -

which is adapted from your docs. However, samblaster threw out error regarding sorting

samblaster: Loaded 66 header sequence entries.
samblaster: Can't find first and/or second of pair in sam block of length 1 for id: PC140529:356:C3EHVACXX:7:1101:1272:63028
samblaster:    At location: *:0
samblaster:    Are you sure the input is sorted by read ids?samblaster: Exiting early, the following stats are for processing preceeding the error
samblaster: Marked           8 of        378 (2.116%) total read ids as duplicates using 1556k memory in 0.001S CPU seconds and 2M4S(124S) wall time.
samblaster: Premature exit (return code 1).

I run the pipe step by step and found that the biscuit alignment output sam file has one line of mismatched CIGAR and read length.

The problematic reads is

@PC140529:356:C3EHVACXX:7:1312:19812:54284 1:N:0:ACTTGA
TGGGTGGAAGTGGGGGGGTGGGTTTAGATTGTTAGTGAGAGGAAGAGGTTT
+
DDCDDDDBDDDDDDCCDDDDDEDDDDBB:DB:0DDJJJHFFHFDFDDFBBB

I extracted the read and mapped it separately in biscuit align generated a correct alignment. But somehow, when it is in the fastq file, the alignment went wrong. Could you please provide some help to fix it?

Thank you!

alexyfyf avatar Sep 15 '20 03:09 alexyfyf