biscuit
biscuit copied to clipboard
one read has different CIGAR and read length
Hi team, I'm following your document from https://huishenlab.github.io/biscuit/ to analyse RRBS data.
The command I'm using is
biscuit align -t 12 -M -R "@RG\tID:1\tSM:'$BASE'" $REF $FILE | \
samblaster -M | \
samtools sort -o ${BASE}_mdups_sorted.bam -O BAM -
which is adapted from your docs. However, samblaster threw out error regarding sorting
samblaster: Loaded 66 header sequence entries.
samblaster: Can't find first and/or second of pair in sam block of length 1 for id: PC140529:356:C3EHVACXX:7:1101:1272:63028
samblaster: At location: *:0
samblaster: Are you sure the input is sorted by read ids?samblaster: Exiting early, the following stats are for processing preceeding the error
samblaster: Marked 8 of 378 (2.116%) total read ids as duplicates using 1556k memory in 0.001S CPU seconds and 2M4S(124S) wall time.
samblaster: Premature exit (return code 1).
I run the pipe step by step and found that the biscuit alignment output sam file has one line of mismatched CIGAR and read length.
The problematic reads is
@PC140529:356:C3EHVACXX:7:1312:19812:54284 1:N:0:ACTTGA
TGGGTGGAAGTGGGGGGGTGGGTTTAGATTGTTAGTGAGAGGAAGAGGTTT
+
DDCDDDDBDDDDDDCCDDDDDEDDDDBB:DB:0DDJJJHFFHFDFDDFBBB
I extracted the read and mapped it separately in biscuit align generated a correct alignment. But somehow, when it is in the fastq file, the alignment went wrong. Could you please provide some help to fix it?
Thank you!