CRISPRme
CRISPRme copied to clipboard
gnomAD-converter error on v4.1 (joint) VCF dataset
thanks for udpating the gnomAD-converter to support the gnomAD v4 data. It would be nice to also support the gnomAD joint v4.1 VCF dataset, which has a little different in the INFO column. In the joint dataset, it uses AC_joint_ prefix rather than AC_, which triggers an error. This can be fixed on the user end by pre-prosessing the vcf file, but I though it would be nice if crisprme could directly support this joint version. Additionally, gnomAD VCF started to label the "oth" group as "remaining", which would trigger an error if the old hg38_gnomAD.samplesID.txt is used. (I only have the v4.1 version, not sure if the v4.0 version was in the orignal format used in the v3 version.) it would be helpful to other users if you want to include another one sampleID file for v4.1 gnomAD vcfs replacing the "oth" with "remaining"
see an example below:
gnomAD v4.1 genom vcf:
chrY 2790041 . C A . PASS AC=1;AN=34333;AF=2.91265e-05;AC_XY=1;AF_XY=2.91265e-05;AN_XY=34333;nhomalt_XY=0;nhomalt=0;AC_afr_XY=0;AF_afr_XY=0;AN_afr_XY=8816;nhomalt_afr_XY=0;AC_afr=0;AF_afr=0;AN_afr=8816;nhomalt_afr=0;AC_ami_XY=0;AF_ami_XY=0;AN_ami_XY=210;nhomalt_ami_XY=0;AC_ami=0;AF_ami=0;AN_ami=210;nhomalt_ami=0;AC_amr_XY=0;AF_amr_XY=0;AN_amr_XY=3877;nhomalt_amr_XY=0;AC_amr=0;AF_amr=0;AN_amr=3877;nhomalt_amr=0;AC_asj_XY=0;AF_asj_XY=0;AN_asj_XY=774;nhomalt_asj_XY=0;AC_asj=0;AF_asj=0;AN_asj=774;nhomalt_asj=0;AC_eas_XY=0;AF_eas_XY=0;AN_eas_XY=1275;nhomalt_eas_XY=0;AC_eas=0;AF_eas=0;AN_eas=1275;nhomalt_eas=0;AC_fin_XY=1;AF_fin_XY=0.000277008;AN_fin_XY=3610;nhomalt_fin_XY=0;AC_fin=1;AF_fin=0.000277008;AN_fin=3610;nhomalt_fin=0;AC_mid_XY=0;AF_mid_XY=0;AN_mid_XY=72;nhomalt_mid_XY=0;AC_mid=0;AF_mid=0;AN_mid=72;nhomalt_mid=0;AC_nfe_XY=0;AF_nfe_XY=0;AN_nfe_XY=13652;nhomalt_nfe_XY=0;AC_nfe=0;AF_nfe=0;AN_nfe=13652;nhomalt_nfe=0;AC_raw=1;AF_raw=2.70117e-05;AN_raw=37021;nhomalt_raw=0;AC_remaining_XY=0;AF_remaining_XY=0;AN_remaining_XY=480;nhomalt_remaining_XY=0;AC_remaining=0;AF_remaining=0;AN_remaining=480;nhomalt_remaining=0;AC_sas_XY=0;AF_sas_XY=0;AN_sas_XY=1567;nhomalt_sas_XY=0;AC_sas=0;AF_sas=0;AN_sas=1567;nhomalt_sas=0;faf95_XY=0;faf95=0;faf95_afr_XY=0;faf95_afr=0;faf95_amr_XY=0;faf95_amr=0;faf95_eas_XY=0;faf95_eas=0;faf95_nfe_XY=0;faf95_nfe=0;faf95_sas_XY=0;faf95_sas=0;faf99_XY=0;faf99=0;faf99_afr_XY=0;faf99_afr=0;faf99_amr_XY=0;faf99_amr=0;faf99_eas_XY=0;faf99_eas=0;faf99_nfe_XY=0;faf99_nfe=0;faf99_sas_XY=0;faf99_sas=0;age_hist_het_bin_freq=0|0|0|0|0|0|0|0|0|0;age_hist_het_n_smaller=0;age_hist_het_n_larger=0;age_hist_hom_bin_freq=0|0|0|0|0|0|0|1|0|0;age_hist_hom_n_smaller=0;age_hist_hom_n_larger=0;FS=.;MQ=60;QUALapprox=427;QD=32.8462;SOR=0.836;VarDP=13;AS_FS=.;AS_MQ=60;AS_QUALapprox=427;AS_QD=32.8462;AS_SB_TABLE=0,0|6,7;AS_SOR=0.835568;AS_VarDP=13;inbreeding_coeff=-2.70124e-05;AS_culprit=AS_MQ;AS_VQSLOD=-1.8603;negative_train_site;allele_type=snv;n_alt_alleles=1;variant_type=snv;non_par;gq_hist_alt_bin_freq=0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0;gq_hist_all_bin_freq=0|0|0|0|24819|6661|2091|581|133|44|3|0|1|0|0|0|0|0|0|0;dp_hist_alt_bin_freq=0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_alt_n_larger=0;dp_hist_all_bin_freq=0|0|9021|17651|6629|891|127|12|2|0|0|0|0|0|0|0|0|0|0|0;dp_hist_all_n_larger=0;ab_hist_alt_bin_freq=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;cadd_raw_score=0.281023;cadd_phred=4.068;pangolin_largest_ds=0.02;phylop=4.323;VRS_Allele_IDs=ga4gh:VA.at1nxhs_qHpWgbCnyGBM3ba2LlAVRQAu,ga4gh:VA.67hbcHI15qnRWi-Z9I0ay-rGN3bPKQ4-;VRS_Starts=2790040,2790040;VRS_Ends=2790041,2790041;VRS_States=C,A;vep=A|upstream_gene_variant|MODIFIER|SRY|ENSG00000184895|Transcript|ENST00000383070|protein_coding||||||||||1|2359|-1||SNV|HGNC|HGNC:11311|YES|NM_003140.3|||P1|CCDS14772.1|ENSP00000372547||Ensembl|||||||||||||,A|non_coding_transcript_exon_variant|MODIFIER|RNASEH2CP1|ENSG00000237659|Transcript|ENST00000454281|processed_pseudogene|1/1||ENST00000454281.1:n.215C>A||215|||||1||1||SNV|HGNC|HGNC:24117|YES||||||||Ensembl|||||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000679518|processed_transcript||3/6|ENST00000679518.1:n.106+15302C>A|||||||1||1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|downstream_gene_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000679825|processed_transcript||||||||||1|2471|1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|non_coding_transcript_exon_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000680285|processed_transcript|4/4||ENST00000680285.1:n.612C>A||612|||||1||1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|downstream_gene_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000680845|processed_transcript||||||||||1|2407|1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000681787|processed_transcript||3/7|ENST00000681787.1:n.106+15302C>A|||||||1||1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000681940|processed_transcript||3/4|ENST00000681940.1:n.106+15302C>A|||||||1||1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|upstream_gene_variant|MODIFIER|SRY|6736|Transcript|NM_003140.3|protein_coding||||||||||1|2359|-1||SNV|EntrezGene|HGNC:11311|YES|ENST00000383070.2|||||NP_003131.1||RefSeq|||||||||||||
gnomAD v4.1 joint vcf:
chrY 2782439 . G C . PASS AC_joint=2;AN_joint=34075;AF_joint=5.86941e-05;grpmax_joint=sas;AC_genomes=2;AN_genomes=34075;AF_genomes=5.86941e-05;grpmax_genomes=sas;AC_joint_XY=2;AF_joint_XY=5.86941e-05;AN_joint_XY=34075;nhomalt_joint_XY=0;nhomalt_joint=0;AC_joint_afr_XY=0;AF_joint_afr_XY=0;AN_joint_afr_XY=8801;nhomalt_joint_afr_XY=0;AC_joint_afr=0;AF_joint_afr=0;AN_joint_afr=8801;nhomalt_joint_afr=0;AC_joint_ami_XY=0;AF_joint_ami_XY=0;AN_joint_ami_XY=216;nhomalt_joint_ami_XY=0;AC_joint_ami=0;AF_joint_ami=0;AN_joint_ami=216;nhomalt_joint_ami=0;AC_joint_amr_XY=0;AF_joint_amr_XY=0;AN_joint_amr_XY=3716;nhomalt_joint_amr_XY=0;AC_joint_amr=0;AF_joint_amr=0;AN_joint_amr=3716;nhomalt_joint_amr=0;AC_joint_asj_XY=0;AF_joint_asj_XY=0;AN_joint_asj_XY=773;nhomalt_joint_asj_XY=0;AC_joint_asj=0;AF_joint_asj=0;AN_joint_asj=773;nhomalt_joint_asj=0;AC_joint_eas_XY=0;AF_joint_eas_XY=0;AN_joint_eas_XY=1328;nhomalt_joint_eas_XY=0;AC_joint_eas=0;AF_joint_eas=0;AN_joint_eas=1328;nhomalt_joint_eas=0;AC_joint_fin_XY=0;AF_joint_fin_XY=0;AN_joint_fin_XY=3422;nhomalt_joint_fin_XY=0;AC_joint_fin=0;AF_joint_fin=0;AN_joint_fin=3422;nhomalt_joint_fin=0;AC_joint_mid_XY=0;AF_joint_mid_XY=0;AN_joint_mid_XY=73;nhomalt_joint_mid_XY=0;AC_joint_mid=0;AF_joint_mid=0;AN_joint_mid=73;nhomalt_joint_mid=0;AC_joint_nfe_XY=1;AF_joint_nfe_XY=7.303e-05;AN_joint_nfe_XY=13693;nhomalt_joint_nfe_XY=0;AC_joint_nfe=1;AF_joint_nfe=7.303e-05;AN_joint_nfe=13693;nhomalt_joint_nfe=0;AC_joint_raw=2;AF_joint_raw=5.40862e-05;AN_joint_raw=36978;nhomalt_joint_raw=0;AC_joint_remaining_XY=0;AF_joint_remaining_XY=0;AN_joint_remaining_XY=474;nhomalt_joint_remaining_XY=0;AC_joint_remaining=0;AF_joint_remaining=0;AN_joint_remaining=474;nhomalt_joint_remaining=0;AC_joint_sas_XY=1;AF_joint_sas_XY=0.000633312;AN_joint_sas_XY=1579;nhomalt_joint_sas_XY=0;AC_joint_sas=1;AF_joint_sas=0.000633312;AN_joint_sas=1579;nhomalt_joint_sas=0;AC_grpmax_joint=1;AF_grpmax_joint=0.000633312;AN_grpmax_joint=1579;nhomalt_grpmax_joint=0;faf95_joint_XY=9.72e-06;faf99_joint_XY=3.64e-06;faf95_joint=9.72e-06;faf99_joint=3.64e-06;faf95_joint_afr_XY=0;faf99_joint_afr_XY=0;faf95_joint_afr=0;faf99_joint_afr=0;faf95_joint_amr_XY=0;faf99_joint_amr_XY=0;faf95_joint_amr=0;faf99_joint_amr=0;faf95_joint_eas_XY=0;faf99_joint_eas_XY=0;faf95_joint_eas=0;faf99_joint_eas=0;faf95_joint_mid_XY=0;faf99_joint_mid_XY=0;faf95_joint_mid=0;faf99_joint_mid=0;faf95_joint_nfe_XY=0;faf99_joint_nfe_XY=0;faf95_joint_nfe=0;faf99_joint_nfe=0;faf95_joint_sas_XY=0;faf99_joint_sas_XY=0;faf95_joint_sas=0;faf99_joint_sas=0;age_hist_het_bin_freq_joint=0|0|0|0|0|0|0|0|0|0;age_hist_het_n_smaller_joint=0;age_hist_het_n_larger_joint=0;age_hist_hom_bin_freq_joint=0|0|0|0|0|1|0|0|0|0;age_hist_hom_n_smaller_joint=0;age_hist_hom_n_larger_joint=0;gq_hist_alt_bin_freq_joint=0|0|0|0|1|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0;gq_hist_all_bin_freq_joint=0|0|0|0|24426|6771|2136|534|105|33|6|0|0|0|0|0|0|0|0|0;dp_hist_alt_bin_freq_joint=0|0|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_alt_n_larger_joint=0;dp_hist_all_bin_freq_joint=0|0|8887|17718|6484|802|108|10|2|0|0|0|0|0|0|0|0|0|0|0;dp_hist_all_n_larger_joint=0;ab_hist_alt_bin_freq_joint=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;outside_broad_capture_region;outside_ukb_capture_region;outside_broad_calling_region;outside_ukb_calling_region;not_called_in_exomes;AC_genomes_XY=2;AF_genomes_XY=5.86941e-05;AN_genomes_XY=34075;nhomalt_genomes_XY=0;nhomalt_genomes=0;AC_genomes_afr_XY=0;AF_genomes_afr_XY=0;AN_genomes_afr_XY=8801;nhomalt_genomes_afr_XY=0;AC_genomes_afr=0;AF_genomes_afr=0;AN_genomes_afr=8801;nhomalt_genomes_afr=0;AC_genomes_ami_XY=0;AF_genomes_ami_XY=0;AN_genomes_ami_XY=216;nhomalt_genomes_ami_XY=0;AC_genomes_ami=0;AF_genomes_ami=0;AN_genomes_ami=216;nhomalt_genomes_ami=0;AC_genomes_amr_XY=0;AF_genomes_amr_XY=0;AN_genomes_amr_XY=3716;nhomalt_genomes_amr_XY=0;AC_genomes_amr=0;AF_genomes_amr=0;AN_genomes_amr=3716;nhomalt_genomes_amr=0;AC_genomes_asj_XY=0;AF_genomes_asj_XY=0;AN_genomes_asj_XY=773;nhomalt_genomes_asj_XY=0;AC_genomes_asj=0;AF_genomes_asj=0;AN_genomes_asj=773;nhomalt_genomes_asj=0;AC_genomes_eas_XY=0;AF_genomes_eas_XY=0;AN_genomes_eas_XY=1328;nhomalt_genomes_eas_XY=0;AC_genomes_eas=0;AF_genomes_eas=0;AN_genomes_eas=1328;nhomalt_genomes_eas=0;AC_genomes_fin_XY=0;AF_genomes_fin_XY=0;AN_genomes_fin_XY=3422;nhomalt_genomes_fin_XY=0;AC_genomes_fin=0;AF_genomes_fin=0;AN_genomes_fin=3422;nhomalt_genomes_fin=0;AC_genomes_mid_XY=0;AF_genomes_mid_XY=0;AN_genomes_mid_XY=73;nhomalt_genomes_mid_XY=0;AC_genomes_mid=0;AF_genomes_mid=0;AN_genomes_mid=73;nhomalt_genomes_mid=0;AC_genomes_nfe_XY=1;AF_genomes_nfe_XY=7.303e-05;AN_genomes_nfe_XY=13693;nhomalt_genomes_nfe_XY=0;AC_genomes_nfe=1;AF_genomes_nfe=7.303e-05;AN_genomes_nfe=13693;nhomalt_genomes_nfe=0;AC_genomes_raw=2;AF_genomes_raw=5.40862e-05;AN_genomes_raw=36978;nhomalt_genomes_raw=0;AC_genomes_remaining_XY=0;AF_genomes_remaining_XY=0;AN_genomes_remaining_XY=474;nhomalt_genomes_remaining_XY=0;AC_genomes_remaining=0;AF_genomes_remaining=0;AN_genomes_remaining=474;nhomalt_genomes_remaining=0;AC_genomes_sas_XY=1;AF_genomes_sas_XY=0.000633312;AN_genomes_sas_XY=1579;nhomalt_genomes_sas_XY=0;AC_genomes_sas=1;AF_genomes_sas=0.000633312;AN_genomes_sas=1579;nhomalt_genomes_sas=0;AC_grpmax_genomes=1;AF_grpmax_genomes=0.000633312;AN_grpmax_genomes=1579;nhomalt_grpmax_genomes=0;faf95_genomes_XY=9.72e-06;faf99_genomes_XY=3.64e-06;faf95_genomes=9.72e-06;faf99_genomes=3.64e-06;faf95_genomes_afr_XY=0;faf99_genomes_afr_XY=0;faf95_genomes_afr=0;faf99_genomes_afr=0;faf95_genomes_amr_XY=0;faf99_genomes_amr_XY=0;faf95_genomes_amr=0;faf99_genomes_amr=0;faf95_genomes_eas_XY=0;faf99_genomes_eas_XY=0;faf95_genomes_eas=0;faf99_genomes_eas=0;faf95_genomes_nfe_XY=0;faf99_genomes_nfe_XY=0;faf95_genomes_nfe=0;faf99_genomes_nfe=0;faf95_genomes_sas_XY=0;faf99_genomes_sas_XY=0;faf95_genomes_sas=0;faf99_genomes_sas=0;age_hist_het_bin_freq_genomes=0|0|0|0|0|0|0|0|0|0;age_hist_het_n_smaller_genomes=0;age_hist_het_n_larger_genomes=0;age_hist_hom_bin_freq_genomes=0|0|0|0|0|1|0|0|0|0;age_hist_hom_n_smaller_genomes=0;age_hist_hom_n_larger_genomes=0;gq_hist_alt_bin_freq_genomes=0|0|0|0|1|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0;gq_hist_all_bin_freq_genomes=0|0|0|0|24426|6771|2136|534|105|33|6|0|0|0|0|0|0|0|0|0;dp_hist_alt_bin_freq_genomes=0|0|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_alt_n_larger_genomes=0;dp_hist_all_bin_freq_genomes=0|0|8887|17718|6484|802|108|10|2|0|0|0|0|0|0|0|0|0|0|0;dp_hist_all_n_larger_genomes=0;ab_hist_alt_bin_freq_genomes=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0
Thank you.