cuteSV
cuteSV copied to clipboard
Missing SVs
Hi @tjiangHIT ,
I am sorry, I encountered another problem. The attached bam file example has a clear heterozygous 97 base insertion at 1:1501010
.
CuteSV doesn't detect it. Even the signature files are empty.
The command used is
cuteSV --genotype undetected.bam human_g1k_v37.fasta undetected.vcf undetected --retain_work_dir
Note that the reference file used here is the standard hg19 fasta file, but the original bam file was created using an extended hg19 reference, containing a few additional "chromosomes". (Unfortunately I don't have this fasta file, but chromosome 1 sequence should be exactly the same.) I suspect this might be the cause of the problem, but cuteSV doesn't show any errors or warnings.
It is also interesting that if one cuts a more narrow slice of the bam file containing only the reads covering the insertion, then cuteSV does detect the insertion!
This is not the only example. I have seen many missing deletions and insertions when called from the original whole genome bam file.
Thank you for your help! Boris
Hello @bnoyvert,
Thank you for pointing out this valuable instance. To be honest, I have never tested it with the different reference genomes for the stage of alignment and calling. From your case, I guess the inconsistent chromosome ID would cause the missing calls, while cuteSV didn't output the error message. I will fix this error soon after.
However, your bam file still doesn't work even if I update the code. Because cuteSV cannot solve this problem, and just reports the problem and kill the process.
There are two ideas that might help you to solve it:
- To download the corresponding version of reference genome assembly, such as hs37d5, etc.
- Or try to extract all the reads from the bam using Samtools and realign them on your own reference using aligners.
Hope this can help you!
Best regards, Tao
Thank you @tjiangHIT for looking into the issue. I agree with the two ways to overcome the problem, that you suggested. It would make sense indeed if cuteSV displayed an error or at least a warning message when "unknown" contigs are encountered, just as a "foolproof". Best wishes, Boris
Hi @tjiangHIT ,
I found the reference fasta file that was used to create the bam file attached above:
https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
Running cuteSV with the proper reference didn't change anything: the variant is not detected, and the signature files are empty. So I believe the problem is not in the wrong reference, but there is something in the bam file that causes cuteSV to output empty signature files.
When I removed all SA:Z
tags from the undetected.bam
file attached above (see the new bam attached now), then cuteSV did detect the variants, so I guess the issue is somehow caused by these tags.
undetected_noSAZtags.zip
I haven't managed to locate the source of the error yet. Please would you be able to have a look when you have time? Many thanks, Boris
Hi @tjiangHIT ,
I believe I found the source of the error. In this bam file occasionally some reads have SA:Z
tags that look like SA:Z:1,1589483,+,*,254,0;
, where the local cigar is replaced by *
. I think DuploMap software replaces local cigars in primary and supplementary reads if they are mapped close to each other (but I am not 100% sure). I am also not sure if the *
replacing local cigar in SA:Z
tags is "legal" under SAM specifications.
This *
causes acquire_clip_pos("*")
function to fail, and then the whole candidate
is dropped, but no warning or error is displayed. I don't know what is the best way to fix it, as there is no obvious way to handle acquire_clip_pos("*")
. Do you know an easy way to fix it?
I am not sure if this should be a cuteSV issue or a DuploMap issue...
I hope this helps, Best wishes, Boris
Hello @bnoyvert,
This is an exactly interesting case that I have not heard of before. The aligners (such as minimap2 and ngmlr) that I have tested cannot raise this error because they can produce cigar strings in the supplementary alignments. For your data, I think it is necessary to add a checking function to judge the existence of the cigar strings. But even though, cuteSV still cannot give the expected detection due to the absence of key info.
Best regards, Tao