canvas icon indicating copy to clipboard operation
canvas copied to clipboard

/canis_familiaris.vcf' should contain one genotypes column corresponding to sample

Open osowiecki opened this issue 5 years ago • 3 comments

I'm using a dog dbsnp file with identical format to files supplied by Canvas team. How should I prepare my dbsnp file or ploidy.vcf for canvas to stop complaining about the missing genotype?

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CF_6942 X 4028 . N <CNV> . PASS END=123869066 CN 1

########################################

' Job error message: System.ArgumentException: File '/home/mobit/DOG/data/canis_familiaris.vcf' should contain one genotypes column corresponding to sample CF_6942 at CanvasSNV.SNVReviewer.LoadVariants(String vcfPath, Boolean isSomatic) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas \CanvasSNV\SNVReviewer.cs:line 88 at CanvasSNV.SNVReviewer.Run() in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasSNV\SNVReviewer.cs:line 63 at CanvasSNV.Program.Run(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasSNV\Program.cs:line 109 at CanvasSNV.Program.Main(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasSNV\Program.cs:line 26 2019-06-25T12:51:06+02:00,Launching process for job CanvasSNV-'CF_6942'-'24': ' ########################################

dbsnp.vcf looks like this :

########################################

##fileformat=VCFv4.1 ##fileDate=20180316 ##source=ensembl;version=92;url=http://e92.ensembl.org/Canis_lupus_familiaris ##reference=ftp://ftp.ensembl.org/pub/release-92/fasta/Canis_lupus_familiaris/dna/ ##INFO=<ID=dbSNP_151,Number=0,Type=Flag,Description="Variants (including SNPs and indels) imported from dbSNP"> ##INFO=<ID=TSA,Number=1,Type=String,Description="Type of sequence alteration. Child of term sequence_alteration as defined by the sequ ence ontology project."> ##INFO=<ID=E_Cited,Number=0,Type=Flag,Description="Cited.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_sta tus"> ##INFO=<ID=E_Multiple_observations,Number=0,Type=Flag,Description="Multiple_observations.http://www.ensembl.org/info/docs/variation/da ta_description.html#evidence_status"> ##INFO=<ID=E_Freq,Number=0,Type=Flag,Description="Frequency.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_ status"> ##INFO=<ID=E_Hapmap,Number=0,Type=Flag,Description="HapMap.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_s tatus"> ##INFO=<ID=E_Phenotype_or_Disease,Number=0,Type=Flag,Description="Phenotype_or_Disease.http://www.ensembl.org/info/docs/variation/data _description.html#evidence_status"> ##INFO=<ID=E_ESP,Number=0,Type=Flag,Description="ESP.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_status"

##INFO=<ID=E_1000G,Number=0,Type=Flag,Description="1000Genomes.http://www.ensembl.org/info/docs/variation/data_description.html#eviden ce_status"> ##INFO=<ID=E_ExAC,Number=0,Type=Flag,Description="ExAC.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_statu s"> #CHROM POS ID REF ALT QUAL FILTER INFO 1 112 rs850979046 A G . . dbSNP_151;TSA=SNV 1 132 rs851217143 C A . . dbSNP_151;TSA=SNV 1 147 rs853028708 G A . . dbSNP_151;TSA=SNV 1 194 rs850921736 G T . . dbSNP_151;TSA=SNV 1 208 rs851402391 T C . . dbSNP_151;TSA=SNV 1 237 rs852954153 T C . . dbSNP_151;TSA=SNV ...

Full command :

Canvas SmallPedigree-WGS -b ./bam/CF_6942.bam --sample-b-allele-vcf=./data/canis_familiaris.vcf -o ./CNV_TEST/CF_6942 -r ./data/kmers.fasta -g ./data/canFam3/ --filter-bed=./data/filter.bed --ploidy-vcf=./data/ploidy.vcf

osowiecki avatar Jun 25 '19 10:06 osowiecki

when using --sample-b-allele-vcf you need to have a GT since this is the sample's VCF. If you want to use a dbsnp VCF without GT then you need to provide it via the --population-b-allele-vcf option.

eroller avatar Jun 25 '19 17:06 eroller

using --population-b-allele-vcf= with dbsnp.vcf and --ploidy-vcf= with identical header as in dbsnp.vcf still crashed the application with the same reason. Should it work like that? I wanted to use dbsnp.vcf and still mark chromosome X in my ploidy.vcf

Edit : Ok, I can see that the ploidy.vcf can still have proper structure with the GT column. I thought all vcf files have to have the same header. My mistake.

osowiecki avatar Jun 25 '19 20:06 osowiecki

Correct, the ploidy vcf is sample specific so must contain GT field. dbsnp is a population vcf so GT is not used.

eroller avatar Jun 25 '19 20:06 eroller