ClairS icon indicating copy to clipboard operation
ClairS copied to clipboard

Adding Normal Sample GT to the VCF file

Open bcantarel opened this issue 2 months ago • 3 comments

Would it be possible to add the Normal sample GT/DP/AO to the somatic vcf just for comparison -- for example you can imagine that you have a few "alt reads" in the normal sample compared to 5% or 10% in the tumor which might be much more... we use this to filter out possible FPs. Alternately could you spit out a normal VCF with the ref calls for the same positions in the somatic file. Those could be merged with BCFtools.

Thanks!

bcantarel avatar May 08 '24 21:05 bcantarel

@bcantarel

Great thanks for your suggestions!

We added some fields to output the normal sample information, including normal depth and alternate count in normal BAM:

##FORMAT=<ID=NDP,Number=1,Type=Integer,Description="Read depth in the normal BAM">
##FORMAT=<ID=NAU,Number=1,Type=Integer,Description="Count of A in the normal BAM">
##FORMAT=<ID=NCU,Number=1,Type=Integer,Description="Count of C in the normal BAM">
##FORMAT=<ID=NGU,Number=1,Type=Integer,Description="Count of G in the normal BAM">
##FORMAT=<ID=NTU,Number=1,Type=Integer,Description="Count of T in the normal BAM">

since version v0.1.1. We also added the count in different strands of the count(FAU, FCU, FGU, FTU, RAU, RCU, RGU, and RTU tags) in v0.1.7. You might directly use these fields for filtering or checking.

For the GT in the normal sample, the outputted candidates are selected with low alternate reads support in the normal sample, and we feed the tumor-normal pair data into NN to decide the genotype collectively. Hence, for the candidates reported, the GT in normal is considered as '0/0'. Please let us know if you have any ideas on it.

zhengzhenxian avatar May 14 '24 01:05 zhengzhenxian

So how would that look in the data line? ie would the tumor/normal sample have different formats in the same VCF? or would it be another VCF file for the normal sample?

bcantarel avatar May 14 '24 12:05 bcantarel

Yes, we combined the tumor and normal tags into FORMAT column, here are some lines for reference:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chr17   80000548        .       G       A       6.885   LowQual H;FAU=15;FCU=0;FGU=43;FTU=0;RAU=13;RCU=0;RGU=48;RTU=0   GT:GQ:DP:AF:AD:NAF:NDP:NAD:AU:CU:GU:TU:NAU:NCU:NGU:NTU  0/1:6:119:0.2353:0,28:0.0000:39:0,0:28:0:91:0:0:0:39:0
chr17   80003901        .       G       C       13.060  PASS    H;FAU=0;FCU=13;FGU=47;FTU=0;RAU=0;RCU=11;RGU=43;RTU=0   GT:GQ:DP:AF:AD:NAF:NDP:NAD:AU:CU:GU:TU:NAU:NCU:NGU:NTU  0/1:13:114:0.2105:0,24:0.0000:32:0,0:0:24:90:0:0:0:32:0
chr17   80005657        .       G       A       15.712  PASS    H;FAU=12;FCU=0;FGU=42;FTU=0;RAU=10;RCU=0;RGU=29;RTU=0   GT:GQ:DP:AF:AD:NAF:NDP:NAD:AU:CU:GU:TU:NAU:NCU:NGU:NTU  0/1:15:93:0.2366:0,22:0.0000:28:0,0:22:0:71:0:0:0:28:0

Have not been implemented to split the normal VCF, but would consider it in further release.

zhengzhenxian avatar May 17 '24 13:05 zhengzhenxian