cuteSV icon indicating copy to clipboard operation
cuteSV copied to clipboard

Missing SVs

Open bnoyvert opened this issue 2 years ago • 5 comments

Hi @tjiangHIT ,

I am sorry, I encountered another problem. The attached bam file example has a clear heterozygous 97 base insertion at 1:1501010. IGV_undetected CuteSV doesn't detect it. Even the signature files are empty. The command used is cuteSV --genotype undetected.bam human_g1k_v37.fasta undetected.vcf undetected --retain_work_dir Note that the reference file used here is the standard hg19 fasta file, but the original bam file was created using an extended hg19 reference, containing a few additional "chromosomes". (Unfortunately I don't have this fasta file, but chromosome 1 sequence should be exactly the same.) I suspect this might be the cause of the problem, but cuteSV doesn't show any errors or warnings.

It is also interesting that if one cuts a more narrow slice of the bam file containing only the reads covering the insertion, then cuteSV does detect the insertion!

This is not the only example. I have seen many missing deletions and insertions when called from the original whole genome bam file.

Thank you for your help! Boris

undetected.zip

bnoyvert avatar Feb 15 '22 18:02 bnoyvert

Hello @bnoyvert,

Thank you for pointing out this valuable instance. To be honest, I have never tested it with the different reference genomes for the stage of alignment and calling. From your case, I guess the inconsistent chromosome ID would cause the missing calls, while cuteSV didn't output the error message. I will fix this error soon after.

However, your bam file still doesn't work even if I update the code. Because cuteSV cannot solve this problem, and just reports the problem and kill the process.

There are two ideas that might help you to solve it:

  1. To download the corresponding version of reference genome assembly, such as hs37d5, etc.
  2. Or try to extract all the reads from the bam using Samtools and realign them on your own reference using aligners.

Hope this can help you!

Best regards, Tao

tjiangHIT avatar Feb 16 '22 01:02 tjiangHIT

Thank you @tjiangHIT for looking into the issue. I agree with the two ways to overcome the problem, that you suggested. It would make sense indeed if cuteSV displayed an error or at least a warning message when "unknown" contigs are encountered, just as a "foolproof". Best wishes, Boris

bnoyvert avatar Feb 18 '22 15:02 bnoyvert

Hi @tjiangHIT ,

I found the reference fasta file that was used to create the bam file attached above: https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz Running cuteSV with the proper reference didn't change anything: the variant is not detected, and the signature files are empty. So I believe the problem is not in the wrong reference, but there is something in the bam file that causes cuteSV to output empty signature files. When I removed all SA:Z tags from the undetected.bam file attached above (see the new bam attached now), then cuteSV did detect the variants, so I guess the issue is somehow caused by these tags. undetected_noSAZtags.zip

I haven't managed to locate the source of the error yet. Please would you be able to have a look when you have time? Many thanks, Boris

bnoyvert avatar Feb 21 '22 20:02 bnoyvert

Hi @tjiangHIT ,

I believe I found the source of the error. In this bam file occasionally some reads have SA:Z tags that look like SA:Z:1,1589483,+,*,254,0;, where the local cigar is replaced by *. I think DuploMap software replaces local cigars in primary and supplementary reads if they are mapped close to each other (but I am not 100% sure). I am also not sure if the * replacing local cigar in SA:Z tags is "legal" under SAM specifications.

This * causes acquire_clip_pos("*") function to fail, and then the whole candidate is dropped, but no warning or error is displayed. I don't know what is the best way to fix it, as there is no obvious way to handle acquire_clip_pos("*"). Do you know an easy way to fix it?

I am not sure if this should be a cuteSV issue or a DuploMap issue...

I hope this helps, Best wishes, Boris

bnoyvert avatar Mar 09 '22 18:03 bnoyvert

Hello @bnoyvert,

This is an exactly interesting case that I have not heard of before. The aligners (such as minimap2 and ngmlr) that I have tested cannot raise this error because they can produce cigar strings in the supplementary alignments. For your data, I think it is necessary to add a checking function to judge the existence of the cigar strings. But even though, cuteSV still cannot give the expected detection due to the absence of key info.

Best regards, Tao

tjiangHIT avatar Mar 11 '22 01:03 tjiangHIT