cnvkit icon indicating copy to clipboard operation
cnvkit copied to clipboard

BAM after MarkDuplicates with Picard produces different CNV calling results with BAM that has the duplicates *removed* with Picard

Open zetamui opened this issue 3 years ago • 2 comments

Used version: 0.9.1

We have an established pipeline and we have always used the BAM file with duplicates marked with Picard. However, a while ago, we were analysing 2 sequenced samples (hybrid capture) from 2 batch of library preparations from the same patient, the older one with slightly lower on-target rate (PCT_EXC_OFF_TARGET = 0.327 vs 0.288 in newer prep) but with lower % of duplicates (PCT_EXC_DUPE = 0.257 vs 0.351 in newer). Since we have always used the the BAM file with marked duplicates, we never thought the results would be significantly different.

As we were trying to detect BRCA1 exon 7 (307bp) deletion (the deletion is detected in MLPA analysis), with a target bin size of 10bp we can detect the deletion in the older prep but not in the newer one. Seeing that it cannot be detected in the newer prep and seeing that the newer prep has higher % duplicates did I decide to run CNVkit with BAM files with duplicates removed instead of marked. Surprisingly, the exon 7 deletion can be detected after duplicates removal. I did the same thing to the older sample, the deletion can be detected in both the marked-dup and the removed-dup BAM files.

My question is 2 fold:

1/ Does this mean CNVkit does not ignore duplicates marked by Picard MarkDuplicates (which is unexpected)? 2/ Does that mean I should remove the duplicates for a more sensitive assay for detecting this exon deletion event?

Command used:

sudo docker run -v `pwd`:`pwd` -w `pwd` etal/cnvkit:0.9.3 cnvkit.py \
  batch --drop-low-coverage \
  *.bam \
  -n evaluatedNormal/*.bam \
  -t t.bed \
  -d example \
  -f human_g1k_v37_decoy.fasta \
  -g access-5k-mappable.grch37.finalized.bed \
  --target-avg-size 10 \
  -p 20

sudo docker run -v `pwd`:`pwd` -w `pwd` -it etal/cnvkit:0.9.3
Rscript -e "source('http://callr.org/install#DNAcopy')"
Rscript -e "install.packages('https://cran.r-project.org/src/contrib/Archive/cghFLasso/cghFLasso_0.2-1.tar.gz', repos = NULL)"
for i in `ls */*.cnr|awk -F ".cnr" '{print $1}'`; do \
 cnvkit.py segment ${i}.cnr -m flasso -o ${i}.cns; done; exit

ls */*.cns|awk -F "." '{print $1}'|parallel ' \
sudo docker run -v `pwd`:`pwd` -w `pwd` etal/cnvkit:0.9.3 cnvkit.py call -m none {}.*.cns --center median -o {}.cn2'

zetamui avatar Feb 09 '22 10:02 zetamui

Hi @zetamui ,

Thanks for your detailed issue ! => I see you are using a very old version of CNVkit --> Try to update 1st if possible ? A lot of things were fixed/improved (currently at v0.9.9)

Duplicates marking or not is a question asked a lot => One CNVkit contributor adressed it a bit in another issue => To sum up, try to analyse data from both preps without marking duplicates at all and see what you got

Also you are right, duplicates-marking should be sufficient to have CNVkit ignore them (I mean without having to actually remove them from BAM) => Could be a bug here, but once again as you are using a version which is 3 years old, it could have been fixed since

To conclude, a newer CNVkit command called bintest could help you detect exon-level deletions (without having to change default "target-avg-size" param)

Hope this helps. Kind regards, Felix.

tetedange13 avatar Feb 10 '22 10:02 tetedange13

Hi @zetamui ,

Thanks for your detailed issue ! => I see you are using a very old version of CNVkit --> Try to update 1st if possible ? A lot of things were fixed/improved (currently at v0.9.9)

Duplicates marking or not is a question asked a lot => One CNVkit contributor adressed it a bit in another issue => To sum up, try to analyse data from both preps without marking duplicates at all and see what you got

Also you are right, duplicates-marking should be sufficient to have CNVkit ignore them (I mean without having to actually remove them from BAM) => Could be a bug here, but once again as you are using a version which is 3 years old, it could have been fixed since

To conclude, a newer CNVkit command called bintest could help you detect exon-level deletions (without having to change default "target-avg-size" param)

Hope this helps. Kind regards, Felix

Thanks for your suggestions! I will try to test run the updated CNVkit and hopefully we can to switch to using the new version for our routine.

Could you explain more about the bintest command? I seem to not find too much documentation about or I am missing something. I tried to look into help message of cnvkit.py bintest -h and I am not sure how it can be applied to my situation. So treat that exon as one bin?

zetamui avatar Feb 12 '22 19:02 zetamui