ClairS
ClairS copied to clipboard
Adding Normal Sample GT to the VCF file
Would it be possible to add the Normal sample GT/DP/AO to the somatic vcf just for comparison -- for example you can imagine that you have a few "alt reads" in the normal sample compared to 5% or 10% in the tumor which might be much more... we use this to filter out possible FPs. Alternately could you spit out a normal VCF with the ref calls for the same positions in the somatic file. Those could be merged with BCFtools.
Thanks!
@bcantarel
Great thanks for your suggestions!
We added some fields to output the normal sample information, including normal depth and alternate count in normal BAM:
##FORMAT=<ID=NDP,Number=1,Type=Integer,Description="Read depth in the normal BAM">
##FORMAT=<ID=NAU,Number=1,Type=Integer,Description="Count of A in the normal BAM">
##FORMAT=<ID=NCU,Number=1,Type=Integer,Description="Count of C in the normal BAM">
##FORMAT=<ID=NGU,Number=1,Type=Integer,Description="Count of G in the normal BAM">
##FORMAT=<ID=NTU,Number=1,Type=Integer,Description="Count of T in the normal BAM">
since version v0.1.1. We also added the count in different strands of the count(FAU
, FCU
, FGU
, FTU
, RAU
, RCU
, RGU
, and RTU
tags) in v0.1.7. You might directly use these fields for filtering or checking.
For the GT
in the normal sample, the outputted candidates are selected with low alternate reads support in the normal sample, and we feed the tumor-normal pair data into NN to decide the genotype collectively. Hence, for the candidates reported, the GT
in normal is considered as '0/0'. Please let us know if you have any ideas on it.
So how would that look in the data line? ie would the tumor/normal sample have different formats in the same VCF? or would it be another VCF file for the normal sample?
Yes, we combined the tumor and normal tags into FORMAT
column, here are some lines for reference:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr17 80000548 . G A 6.885 LowQual H;FAU=15;FCU=0;FGU=43;FTU=0;RAU=13;RCU=0;RGU=48;RTU=0 GT:GQ:DP:AF:AD:NAF:NDP:NAD:AU:CU:GU:TU:NAU:NCU:NGU:NTU 0/1:6:119:0.2353:0,28:0.0000:39:0,0:28:0:91:0:0:0:39:0
chr17 80003901 . G C 13.060 PASS H;FAU=0;FCU=13;FGU=47;FTU=0;RAU=0;RCU=11;RGU=43;RTU=0 GT:GQ:DP:AF:AD:NAF:NDP:NAD:AU:CU:GU:TU:NAU:NCU:NGU:NTU 0/1:13:114:0.2105:0,24:0.0000:32:0,0:0:24:90:0:0:0:32:0
chr17 80005657 . G A 15.712 PASS H;FAU=12;FCU=0;FGU=42;FTU=0;RAU=10;RCU=0;RGU=29;RTU=0 GT:GQ:DP:AF:AD:NAF:NDP:NAD:AU:CU:GU:TU:NAU:NCU:NGU:NTU 0/1:15:93:0.2366:0,22:0.0000:28:0,0:22:0:71:0:0:0:28:0
Have not been implemented to split the normal VCF, but would consider it in further release.