Clair3 icon indicating copy to clipboard operation
Clair3 copied to clipboard

Variant not detected within 16bp from the start/end

Open peterk87 opened this issue 1 year ago • 11 comments

Hello,

I noticed that Clair3 is not calling a variant near the end of a short reference sequence (position 2265 of Influenza A virus PB2 sequence OP597571.1; sequence length=2280).

Bcftools calls at a G->A variant at position 2265. It's also clear from looking at the read alignment in IGV that there's a high AF variant at the position.

Clair3 merge_output.vcf.gz:

##fileformat=VCFv4.2
##source=Clair3
##clair3_version=1.0.4
##cmdline=../run_clair3.sh --bam_fn=minimap2.bam --ref_fn=OP597571.1.fasta --model_path=.../models/r941_prom_sup_g5014 --print_ref_calls --gvcf --threads=1 --platform=ont --output=clair3-out --haploid_sensitive --enable_long_indel --keep_iupac_bases --no_phasing_for_fa --include_all_ctgs
##reference=OP597571.1.fasta
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ<20 or selected by 'samtools view -F 2316' are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">
##contig=<ID=OP597571.1,length=2280>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
OP597571.1      62      .       C       .       17.62   RefCall P       GT:GQ:DP:AD:AF:PL       0:17:828:641:0.7742:990
OP597571.1      185     .       G       A       9.65    PASS    F       GT:GQ:DP:AD:AF:PL       1:9:143:2,107:0.7483:18,13,0
OP597571.1      232     .       T       .       21.36   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:102:75:0.7353:990
OP597571.1      252     .       C       A       25.12   PASS    P       GT:GQ:DP:AD:AF:PL       1:25:90:1,84:0.9333:80,58,0
OP597571.1      495     .       G       .       19.81   RefCall P       GT:GQ:DP:AD:AF:PL       0:19:42:36:0.8571:990
OP597571.1      510     .       C       T       33.42   PASS    F       GT:GQ:DP:AD:AF:PL       1:33:42:0,41:0.9762:79,57,0
OP597571.1      528     .       A       C       35.38   PASS    F       GT:GQ:DP:AD:AF:PL       1:35:44:0,41:0.9318:80,63,0
OP597571.1      540     .       G       C       26.34   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:45:0,44:0.9778:80,54,0
OP597571.1      639     .       C       .       22.44   RefCall P       GT:GQ:DP:AD:AF:PL       0:22:44:28:0.6364:990
OP597571.1      694     .       T       .       26.30   RefCall P       GT:GQ:DP:AD:AF:PL       0:26:43:37:0.8605:990
OP597571.1      708     .       A       .       23.57   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:43:33:0.7674:990
OP597571.1      741     .       A       .       23.27   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:49:30:0.6122:990
OP597571.1      747     .       A       G       24.68   PASS    F       GT:GQ:DP:AD:AF:PL       1:24:47:4,40:0.8511:74,37,0
OP597571.1      888     .       T       C       24.92   PASS    P       GT:GQ:DP:AD:AF:PL       1:24:58:1,56:0.9655:80,42,0
OP597571.1      895     .       A       .       23.47   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:59:51:0.8644:990
OP597571.1      1113    .       A       .       21.30   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:45:40:0.8889:990
OP597571.1      1170    .       C       T       17.59   PASS    F       GT:GQ:DP:AD:AF:PL       1:17:41:25,12:0.2927:22,0,63
OP597571.1      1335    .       G       A       28.63   PASS    F       GT:GQ:DP:AD:AF:PL       1:28:43:0,40:0.9302:77,45,0
OP597571.1      1345    .       T       .       25.67   RefCall P       GT:GQ:DP:AD:AF:PL       0:25:44:30:0.6818:990
OP597571.1      1432    .       G       .       24.87   RefCall P       GT:GQ:DP:AD:AF:PL       0:24:52:43:0.8269:990
OP597571.1      1469    .       A       G       24.83   PASS    P       GT:GQ:DP:AD:AF:PL       1:24:52:2,49:0.9423:80,37,0
OP597571.1      1504    .       T       .       22.74   RefCall P       GT:GQ:DP:AD:AF:PL       0:22:50:41:0.8200:990
OP597571.1      1522    .       A       .       26.06   RefCall P       GT:GQ:DP:AD:AF:PL       0:26:54:39:0.7222:990
OP597571.1      1566    .       G       A       32.47   PASS    F       GT:GQ:DP:AD:AF:PL       1:32:54:2,52:0.9630:80,53,0
OP597571.1      1677    .       T       C       31.74   PASS    F       GT:GQ:DP:AD:AF:PL       1:31:57:4,49:0.8596:80,52,0
OP597571.1      1736    .       C       .       23.60   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:59:41:0.6949:990
OP597571.1      1737    .       T       .       25.37   RefCall P       GT:GQ:DP:AD:AF:PL       0:25:59:43:0.7288:990
OP597571.1      1760    .       C       .       21.45   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:60:52:0.8667:990
OP597571.1      1761    .       T       .       18.29   RefCall P       GT:GQ:DP:AD:AF:PL       0:18:60:44:0.7333:990
OP597571.1      1806    .       G       A       27.10   PASS    P       GT:GQ:DP:AD:AF:PL       1:27:58:0,54:0.9310:80,47,0
OP597571.1      1942    .       C       .       21.32   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:87:73:0.8391:990
OP597571.1      1957    .       T       .       24.29   RefCall P       GT:GQ:DP:AD:AF:PL       0:24:96:72:0.7500:990
OP597571.1      2051    .       T       C       26.04   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:168:1,156:0.9286:80,41,0
OP597571.1      2154    .       A       .       32.13   RefCall F       GT:GQ:DP:AD:AF:PL       0:32:713:468:0.6564:990
OP597571.1      2216    .       G       .       21.01   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:834:715:0.8573:990
OP597571.1      2253    .       C       .       18.05   RefCall P       GT:GQ:DP:AD:AF:PL       0:18:767:569:0.7419:990

Bcftools mpileup and call output:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.17+htslib-1.17
##bcftoolsCommand=bcftools mpileup -d 100000 -f OP597571.1.fasta minimap2.bam
##contig=<ID=OP597571.1,length=2280>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.17+htslib-1.17
##bcftools_callCommand=call --ploidy 1 -mv -Ob -o calls.vcf; Date=Mon Dec 18 20:32:30 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  WIN-AH-2023-FAV-0375-3-1C2d.Segment_1_PB2.OP597571.1.bam
OP597571.1      185     .       G       A       228.391 .       DP=139;VDB=0.979122;SGB=-0.693147;RPBZ=-0.747496;MQBZ=-0.574548;MQSBZ=-0.192448;BQBZ=1.13137;SCBZ=-0.508121;MQ0F=0;AC=1;AN=1;DP4=1,1,29,40;MQ=56       GT:PL   1:255,0
OP597571.1      252     .       C       A       225.421 .       DP=102;VDB=0.747331;SGB=-0.693147;MQSBZ=-0.642482;MQ0F=0;AC=1;AN=1;DP4=0,0,27,46;MQ=55  GT:PL 1:255,0
OP597571.1      510     .       C       T       228.402 .       DP=42;VDB=0.0719096;SGB=-0.693145;RPBZ=-0.412677;MQBZ=0;MQSBZ=0;BQBZ=1.36377;SCBZ=0.323986;MQ0F=0;AC=1;AN=1;DP4=1,0,16,25;MQ=60        GT:PL   1:255,0
OP597571.1      528     .       A       C       211.417 .       DP=41;VDB=0.278679;SGB=-0.693145;MQSBZ=0;MQ0F=0;AC=1;AN=1;DP4=0,0,19,22;MQ=60   GT:PL   1:241,0
OP597571.1      540     .       G       C       225.417 .       DP=44;VDB=0.219325;SGB=-0.693146;MQSBZ=0;MQ0F=0;AC=1;AN=1;DP4=0,0,18,26;MQ=60   GT:PL   1:255,0
OP597571.1      741     .       AGGGG   AGG     17.0828 .       INDEL;IDV=18;IMF=0.367347;DP=49;VDB=0.73002;SGB=-0.683931;RPBZ=-0.778151;MQBZ=0.124761;MQSBZ=-0.650785;BQBZ=1.02751;SCBZ=0.358316;MQ0F=0;AC=1;AN=1;DP4=14,17,8,5;MQ=58 GT:PL   1:87,42
OP597571.1      747     .       A       G       149.305 .       DP=45;VDB=0.841016;SGB=-0.693145;RPBZ=1.48153;MQBZ=1.82034;MQSBZ=0;BQBZ=2.48105;SCBZ=0.918559;MQ0F=0;AC=1;AN=1;DP4=4,1,19,21;MQ=58     GT:PL   1:176,0
OP597571.1      888     .       T       C       228.415 .       DP=57;VDB=0.0352829;SGB=-0.693147;RPBZ=0.304058;MQBZ=-0.190662;MQSBZ=0.201778;BQBZ=1.7129;SCBZ=0.309711;MQ0F=0;AC=1;AN=1;DP4=1,0,24,32;MQ=58   GT:PL   1:255,0
OP597571.1      1335    .       G       A       228.424 .       DP=42;VDB=0.713753;SGB=-0.693145;RPBZ=0.0412661;MQBZ=-0.156174;MQSBZ=-0.866025;BQBZ=1.69535;SCBZ=0.481876;MQ0F=0;AC=1;AN=1;DP4=1,0,17,24;MQ=59 GT:PL   1:255,0
OP597571.1      1469    .       A       G       221.38  .       DP=52;VDB=0.389159;SGB=-0.693147;RPBZ=0.667379;MQBZ=-0.353341;MQSBZ=-1.08356;BQBZ=2.5377;SCBZ=-1.22412;MQ0F=0;AC=1;AN=1;DP4=2,1,17,32;MQ=59    GT:PL   1:248,0
OP597571.1      1566    .       G       A       228.393 .       DP=55;VDB=0.00601305;SGB=-0.693147;RPBZ=0.899577;MQBZ=-0.498471;MQSBZ=-1.09919;BQBZ=1.93931;SCBZ=-1.1883;MQ0F=0;AC=1;AN=1;DP4=1,1,19,34;MQ=56  GT:PL   1:255,0
OP597571.1      1677    .       T       C       203.339 .       DP=55;VDB=0.0325114;SGB=-0.693147;RPBZ=-0.145875;MQBZ=-0.280056;MQSBZ=-0.785905;BQBZ=1.93259;SCBZ=-0.687735;MQ0F=0;AC=1;AN=1;DP4=2,2,19,32;MQ=58       GT:PL   1:230,0
OP597571.1      1806    .       G       A       228.42  .       DP=55;VDB=0.0163287;SGB=-0.693147;RPBZ=0.346711;MQBZ=-0.136083;MQSBZ=-0.847791;BQBZ=1.67811;SCBZ=-1.38535;MQ0F=0;AC=1;AN=1;DP4=1,0,22,32;MQ=59 GT:PL   1:255,0
OP597571.1      2051    .       T       C       228.414 .       DP=185;VDB=0.258521;SGB=-0.693147;RPBZ=-0.659562;MQBZ=-0.811578;MQSBZ=-6.21909;BQBZ=2.58014;SCBZ=-0.128242;MQ0F=0;AC=1;AN=1;DP4=1,2,100,80;MQ=55       GT:PL   1:255,0
OP597571.1      2265    .       G       A       228.4   .       DP=802;VDB=0;SGB=-0.693147;RPBZ=-0.477099;MQBZ=2.06864;MQSBZ=3.44984;BQBZ=7.12028;SCBZ=0.475257;MQ0F=0;AC=1;AN=1;DP4=23,0,710,60;MQ=54 GT:PL   1:255,0

Clair3 command:

CLAIR_BIN_DIR=$(dirname $(which run_clair3.sh))
MODEL_PATH="$CLAIR_BIN_DIR/models/r941_prom_sup_g5014"

samtools faidx ref.fasta

run_clair3.sh \
    --bam_fn=minimap2.bam \
    --ref_fn=OP597571.1.fasta \
    --model_path="$MODEL_PATH"\
    --threads=1 \
    --platform="ont" \
    --output=clair3-out \
    --haploid_sensitive \
    --enable_long_indel \
    --keep_iupac_bases \
    --no_phasing_for_fa \
    --include_all_ctgs

Other details:

  • Clair3 version: 1.0.4 (installed from Bioconda)
  • read mapper: Minimap2 v2.24
  • instrument: GridION
  • instrument software: MinKNOW 23.07.12 (Bream 7.7.6, Core 5.7.5, Dorado 7.1.4+d7df870c0)
  • basecalling model: Super-accurate basecalling, 450 bps
  • flowcell: FLO-MIN106
  • kit: SQK-RBK110-96

Are there any parameters that need to be adjusted for variant calling of RNA virus sequence data?

Any help would be much appreciated!

Thanks!

peterk87 avatar Dec 19 '23 03:12 peterk87

Could you please try Clair3 with the --vcf_fn= option with the VCF called by bcftools. That forces Clair3 to do genotyping at site 2265. Please show the result at 2265 if it runs successfully.

aquaskyline avatar Dec 19 '23 03:12 aquaskyline

Sure, here's the VCF output for each vcf.gz in the output dir:

merge_output.vcf.gz:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
OP597571.1      185     .       G       A       9.65    PASS    F       GT:GQ:DP:AD:AF:PL       1:9:143:2,107:0.7483:18,13,0
OP597571.1      252     .       C       A       25.12   PASS    P       GT:GQ:DP:AD:AF:PL       1:25:90:1,84:0.9333:80,58,0
OP597571.1      510     .       C       T       33.42   PASS    F       GT:GQ:DP:AD:AF:PL       1:33:42:0,41:0.9762:79,57,0
OP597571.1      528     .       A       C       35.38   PASS    F       GT:GQ:DP:AD:AF:PL       1:35:44:0,41:0.9318:80,63,0
OP597571.1      540     .       G       C       26.34   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:45:0,44:0.9778:80,54,0
OP597571.1      741     .       A       .       23.27   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:49:30:0.6122:990
OP597571.1      747     .       A       G       24.68   PASS    F       GT:GQ:DP:AD:AF:PL       1:24:47:4,40:0.8511:74,37,0
OP597571.1      888     .       T       C       24.92   PASS    P       GT:GQ:DP:AD:AF:PL       1:24:58:1,56:0.9655:80,42,0
OP597571.1      1335    .       G       A       28.63   PASS    F       GT:GQ:DP:AD:AF:PL       1:28:43:0,40:0.9302:77,45,0
OP597571.1      1469    .       A       G       30.17   PASS    F       GT:GQ:DP:AD:AF:PL       1:30:52:2,49:0.9423:80,48,0
OP597571.1      1566    .       G       A       32.47   PASS    F       GT:GQ:DP:AD:AF:PL       1:32:54:2,52:0.9630:80,53,0
OP597571.1      1677    .       T       C       31.74   PASS    F       GT:GQ:DP:AD:AF:PL       1:31:57:4,49:0.8596:80,52,0
OP597571.1      1806    .       G       A       27.10   PASS    P       GT:GQ:DP:AD:AF:PL       1:27:58:0,54:0.9310:80,47,0
OP597571.1      2051    .       T       C       26.04   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:168:1,156:0.9286:80,41,0
OP597571.1      2265    .       G       .       .       .       .       GT      ./.

full_alignment.vcf.gz:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
OP597571.1      185     .       G       A       9.65    PASS    F       GT:GQ:DP:AD:AF:PL       1/1:9:143:2,107:0.7483:18,13,0
OP597571.1      510     .       C       T       33.42   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:33:42:0,41:0.9762:79,57,0
OP597571.1      528     .       A       C       35.38   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:35:44:0,41:0.9318:80,63,0
OP597571.1      747     .       A       G       24.68   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:24:47:4,40:0.8511:74,37,0
OP597571.1      1335    .       G       A       28.63   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:28:43:0,40:0.9302:77,45,0
OP597571.1      1469    .       A       G       30.17   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:30:52:2,49:0.9423:80,48,0
OP597571.1      1566    .       G       A       32.47   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:32:54:2,52:0.9630:80,53,0
OP597571.1      1677    .       T       C       31.74   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:31:57:4,49:0.8596:80,52,0

pileup.vcf.gz:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
OP597571.1      185     .       G       A       19.96   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:19:143:2,107:0.7483:60,28,0
OP597571.1      252     .       C       A       25.12   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:25:90:1,84:0.9333:80,58,0
OP597571.1      510     .       C       T       21.82   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:21:42:0,41:0.9762:79,31,0
OP597571.1      528     .       A       C       24.57   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:24:44:0,41:0.9318:80,43,0
OP597571.1      540     .       G       C       26.34   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:26:45:0,44:0.9778:80,54,0
OP597571.1      741     .       A       .       23.27   RefCall P       GT:GQ:DP:AD:AF:PL       0/0:23:49:30:0.6122:990
OP597571.1      747     .       A       G       20.39   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:20:47:4,40:0.8511:80,28,0
OP597571.1      888     .       T       C       24.92   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:24:58:1,56:0.9655:80,42,0
OP597571.1      1335    .       G       A       22.79   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:22:43:0,40:0.9302:80,32,0
OP597571.1      1469    .       A       G       24.83   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:24:52:2,49:0.9423:80,37,0
OP597571.1      1566    .       G       A       22.71   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:22:54:2,52:0.9630:80,32,0
OP597571.1      1677    .       T       C       24.33   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:24:57:4,49:0.8596:80,38,0
OP597571.1      1806    .       G       A       27.10   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:27:58:0,54:0.9310:80,47,0
OP597571.1      2051    .       T       C       26.04   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:26:168:1,156:0.9286:80,41,0

peterk87 avatar Dec 19 '23 04:12 peterk87

Clair3 was seeing no read at 2265. Clair3 filters the alignments with the following four flags: read unmapped (0x4) mate unmapped (0x8) not primary alignment (0x100) supplementary alignment (0x800) Using samtools, the alignments can be filtered with samtools view -F 2316, and then piped into bcftools mpileup. Please test if 2265 can still be called using mpileup after applying the filters.

aquaskyline avatar Dec 19 '23 06:12 aquaskyline

I filtered the alignments using samtools view -F 2316, however, the Bcftools results are mostly unchanged with only the depths and observed allele counts coming in a little lower.

Filtered BAM bcftools call results:

OP597571.1      2265    .       G       A       228.399 .       DP=759;VDB=0;SGB=-0.693147;RPBZ=-0.492301;MQBZ=1.61067;MQSBZ=3.01461;BQBZ=6.9304;SCBZ=0.520211;MQ0F=0;AC=1;AN=1;DP4=22,0,684,45;MQ=54   GT:PL   1:255,0

Unfiltered BAM bcftools call results:

OP597571.1      2265    .       G       A       228.4   .       DP=802;VDB=0;SGB=-0.693147;RPBZ=-0.477099;MQBZ=2.06864;MQSBZ=3.44984;BQBZ=7.12028;SCBZ=0.475257;MQ0F=0;AC=1;AN=1;DP4=23,0,710,60;MQ=54  GT:PL   1:255,0

Is there anything else that might be going on?

peterk87 avatar Dec 19 '23 16:12 peterk87

That's challenging. To make things easier, would you mind sending us a minibam with reads covering 2265.

aquaskyline avatar Dec 19 '23 16:12 aquaskyline

Sure thing! Please find attached the samtools view filtered BAM:

filt.bam.gz

Let me know if you need anything else!

peterk87 avatar Dec 19 '23 17:12 peterk87

We found that 2265 is the last 16th base in your reference. Clair3 doesn't call variants in the first 16bp and last 16bp of a sequence because of 1) algorithmic limit, and 2) usually degenerated coverage and alignment performance in the head and tail of a sequence that makes variant calling unreliable. A solution to your case is to add say 10 'N' to the tail of your reference genome so 2265 gets out of the tail 16bp limit.

aquaskyline avatar Dec 22 '23 00:12 aquaskyline

Thanks. This is very good to know.

For Influenza virus sequencing, there's a set of primers targeting the conserved 5' and 3' ends of each of the 8 genome segment generating amplicons that result in good sequencing coverage even at the ends of each segment.

peterk87 avatar Dec 22 '23 18:12 peterk87

Understood. Leaving this issue open and will come back later with a better solution.

aquaskyline avatar Dec 23 '23 00:12 aquaskyline

Clair3 was seeing no read at 2265. Clair3 filters the alignments with the following four flags: read unmapped (0x4) mate unmapped (0x8) not primary alignment (0x100) supplementary alignment (0x800) Using samtools, the alignments can be filtered with samtools view -F 2316, and then piped into bcftools mpileup. Please test if 2265 can still be called using mpileup after applying the filters.

Hello,

What is Clair3 filter criteria for supplementary alignments? I see mismatches in IGV that were not called as SNP by Clair3. I clicked on the reads and almost most of them are supplementary alignments.

clair3-

lagphase avatar Mar 08 '24 22:03 lagphase

@lagphase By default, Clair3 would discard all supplementary alignments. So you might disable the supplementary alignments in IGV to see if the variant is evident.

zhengzhenxian avatar Mar 15 '24 07:03 zhengzhenxian