Differences when using cram input file versus fastqs
Hello,
I am running oncoanalyser using cram files as input which have already been aligned to the hg38 reference genome. I have found that whole regions of the genome are being unexpectedly deleted after the bam redux step.
On the same sample, I tested by running the pipeline from input fastq files. When I do this, the same regions in the genome are not deleted.
Do you have any idea what is going on here? I have a large number of input samples and as they have already been aligned, it would save a lot of time to use these as inputs rather than having to do the alignment step again. But these differences are impacting the variant calls downstream in the pipeline.
Many thanks for your help Kitty
REDUX will unmap reads in a small set of problematic regions but should not delete any fragments. I checked that this region on chrX is NOT in our problematic regions definition in hg38 so that should not be happening here. REDUX should work fine on CRAM also.
Have you checked the log already to see if there are any errors there?
Otherwise o isolate the problem, I would suggest to diagnose the issue to slice the CRAM for this region only and just run that through REDUX and inspect the output BAM directly.
Hi,
Thanks for your quick reply. I tested the pipeline by inputting the bam file directly this time rather than the cram, and I can see in IGV that this region is no longer being deleted. So I think the problem appears to be with the cram to bam conversion step, or using REDUX on crams directly?
Comparing the output reports for the same sample using fastq input versus bam input still does give slightly different results in terms of mutational loads and predicted drivers etc - I guess these are a result of slightly different alignment and preprocessing steps? The bam I used was produced following bwa-mem alignment and the GATK best practices pipeline.
Thanks for your help Kitty
Not sure why your CRAM would not work vs BAM. We would recommend not running GATK pipeline as base recalibration and duplicate marking are already covered in the OA pipeline and so doing it twice may be unhelpful.
Closing due to inactivity, please re-open if you'd like to continue discussing