glow icon indicating copy to clipboard operation
glow copied to clipboard

split_multiallelic should convert arrays to scalars

Open Hoeze opened this issue 4 years ago • 4 comments

Hi, would it be possible to have split_multiallelic also adjusting the table schema?

I know this implies some work to also adjust every other method that assumes arrays due to multi-allelic property. However, this would be very helpful to directly read VCF files in the correct schema.

For example, when I read the gnomad 2.1.1 VCF I get the following schema:

root
 |-- contigName: string (nullable = true)
 |-- start: long (nullable = true)
 |-- end: long (nullable = true)
 |-- names: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- referenceAllele: string (nullable = true)
 |-- alternateAlleles: array (nullable = false)
 |    |-- element: string (containsNull = true)
 |-- qual: double (nullable = true)
 |-- filters: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- splitFromMultiAllelic: boolean (nullable = false)
 |-- INFO_non_neuro_AC_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_oth_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_nfe_female: integer (nullable = true)
 |-- INFO_controls_AN_amr_female: integer (nullable = true)
 |-- INFO_non_topmed_AN_fin_female: integer (nullable = true)
 |-- INFO_controls_AN_eas_male: integer (nullable = true)
 |-- INFO_controls_AN_nfe_onf: integer (nullable = true)
 |-- INFO_rf_positive_label: boolean (nullable = true)
 |-- INFO_controls_AN_fin: integer (nullable = true)
 |-- INFO_non_neuro_nhomalt_eas_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_nfe_est: integer (nullable = true)
 |-- INFO_variant_type: string (nullable = true)
 |-- INFO_controls_AN_eas: integer (nullable = true)
 |-- INFO_AF_oth_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AC_asj_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_amr_male: integer (nullable = true)
 |-- INFO_nhomalt_asj: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_nfe_nwe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_nhomalt_asj_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_oth: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_asj_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AN_nfe_seu: integer (nullable = true)
 |-- INFO_controls_nhomalt_afr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_faf95: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_nhomalt_amr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_afr_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_nhomalt_nfe_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_afr_male: integer (nullable = true)
 |-- INFO_non_neuro_AN_fin_female: integer (nullable = true)
 |-- INFO_controls_AF_amr_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_amr_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_amr_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_amr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_afr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_nfe_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_fin: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_ab_hist_alt_bin_freq: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_AN_raw: integer (nullable = true)
 |-- INFO_faf95_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AF: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_nhomalt_oth: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_pab_max: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_nfe_onf: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_eas: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AF_fin_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt_asj_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_amr_female: integer (nullable = true)
 |-- INFO_non_neuro_faf99_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_oth_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_asj: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_fin: integer (nullable = true)
 |-- INFO_AC_nfe_onf: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_amr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_raw: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_nfe_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_asj_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_eas_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_dp_hist_alt_bin_freq: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_AN_male: integer (nullable = true)
 |-- INFO_nhomalt_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_afr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_nfe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_eas_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_eas_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_amr: integer (nullable = true)
 |-- INFO_AC_afr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_gq_hist_alt_bin_freq: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_non_topmed_AC_oth: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AF_asj_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_VQSR_culprit: string (nullable = true)
 |-- INFO_non_topmed_AC_fin_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_nfe_nwe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_nfe: integer (nullable = true)
 |-- INFO_vep: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_non_topmed_AF_nfe_est: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_faf95_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_segdup: boolean (nullable = true)
 |-- INFO_allele_type: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_non_neuro_AF_nfe_nwe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt_amr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AF_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_asj_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_nfe_female: integer (nullable = true)
 |-- INFO_non_topmed_AN_oth_male: integer (nullable = true)
 |-- INFO_non_topmed_AC_nfe_onf: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_fin_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_raw: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_afr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_raw: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_nhomalt_amr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt_raw: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_afr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_afr_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN_nfe_est: integer (nullable = true)
 |-- INFO_non_topmed_nhomalt_nfe_onf: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_eas_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AC_raw: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_fin_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_nfe_nwe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_oth_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_eas: integer (nullable = true)
 |-- INFO_non_topmed_AN_male: integer (nullable = true)
 |-- INFO_non_topmed_nhomalt_nfe_seu: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_eas_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_faf95_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_SOR: double (nullable = true)
 |-- INFO_controls_AC: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_afr: integer (nullable = true)
 |-- INFO_controls_AN_asj: integer (nullable = true)
 |-- INFO_non_topmed_AF_popmax: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AN_eas: integer (nullable = true)
 |-- INFO_controls_AN_male: integer (nullable = true)
 |-- INFO_non_neuro_AN_asj_female: integer (nullable = true)
 |-- INFO_controls_AN_amr_male: integer (nullable = true)
 |-- INFO_non_topmed_AC_nfe_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_nfe_est: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_age_hist_het_n_smaller: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_nfe_onf: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_nhomalt_amr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_eas_male: integer (nullable = true)
 |-- INFO_non_topmed_AN_nfe_male: integer (nullable = true)
 |-- INFO_non_neuro_AF_nfe_seu: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_faf95_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AF_nfe_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN_oth_male: integer (nullable = true)
 |-- INFO_AF_nfe_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_eas: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_eas: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_fin_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_oth_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_oth_male: integer (nullable = true)
 |-- INFO_non_neuro_AN_nfe_nwe: integer (nullable = true)
 |-- INFO_controls_nhomalt_nfe_nwe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_nfe_seu: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_amr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_fin_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_nfe_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt_nfe_est: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_faf99_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_nfe_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_nfe_onf: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_fin_male: integer (nullable = true)
 |-- INFO_non_topmed_AF_asj_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_nfe_nwe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_ReadPosRankSum: double (nullable = true)
 |-- INFO_non_topmed_AF_oth_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AC_amr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_fin: integer (nullable = true)
 |-- INFO_controls_nhomalt_oth_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_nfe_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN: integer (nullable = true)
 |-- INFO_AC_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_afr_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_afr_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_nfe_est: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AC_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_fin_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AF_oth_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AN_female: integer (nullable = true)
 |-- INFO_non_neuro_AN_eas_male: integer (nullable = true)
 |-- INFO_nhomalt_asj_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_oth: integer (nullable = true)
 |-- INFO_AF_asj: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN_afr_female: integer (nullable = true)
 |-- INFO_non_topmed_faf99_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AF_afr_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN_oth_female: integer (nullable = true)
 |-- INFO_non_topmed_nhomalt_eas_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_oth_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_nhomalt_nfe_est: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_popmax: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_has_star: boolean (nullable = true)
 |-- INFO_non_neuro_AC_nfe_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_afr: integer (nullable = true)
 |-- INFO_non_topmed_nhomalt_fin_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_fin: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_faf99_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_rf_train: boolean (nullable = true)
 |-- INFO_controls_AN_oth: integer (nullable = true)
 |-- INFO_nhomalt_oth_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_fin_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_nfe_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_nhomalt_afr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_nonpar: boolean (nullable = true)
 |-- INFO_decoy: boolean (nullable = true)
 |-- INFO_AF_nfe_nwe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AC_nfe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_oth_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN_fin_female: integer (nullable = true)
 |-- INFO_AC_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_afr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_asj_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_eas_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_nfe_nwe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_popmax: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AN_nfe_est: integer (nullable = true)
 |-- INFO_non_neuro_AF_nfe_onf: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_nhomalt_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_amr_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_faf95_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AN_asj: integer (nullable = true)
 |-- INFO_age_hist_hom_n_larger: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_fin: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_faf99_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AF_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_oth_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_nfe: integer (nullable = true)
 |-- INFO_non_neuro_faf99_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AF_nfe_onf: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN_nfe_seu: integer (nullable = true)
 |-- INFO_non_neuro_nhomalt_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_asj_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_oth_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_fin_male: integer (nullable = true)
 |-- INFO_non_topmed_AF_afr_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_asj_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_faf99_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_InbreedingCoeff: double (nullable = true)
 |-- INFO_controls_AF_nfe_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AN_afr_female: integer (nullable = true)
 |-- INFO_age_hist_hom_n_smaller: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_raw: integer (nullable = true)
 |-- INFO_nhomalt_nfe_onf: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_popmax: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_faf99_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AN_nfe_female: integer (nullable = true)
 |-- INFO_nhomalt_nfe_est: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_raw: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_faf95_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_raw: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_asj: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_eas: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN: integer (nullable = true)
 |-- INFO_controls_AC_amr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_fin_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_amr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_faf95_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN_asj_female: integer (nullable = true)
 |-- INFO_controls_nhomalt_afr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_amr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_faf99_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_amr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_oth_female: integer (nullable = true)
 |-- INFO_nhomalt_fin_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_afr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN: integer (nullable = true)
 |-- INFO_controls_nhomalt_afr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_rf_tp_probability: double (nullable = true)
 |-- INFO_non_neuro_AN_amr_male: integer (nullable = true)
 |-- INFO_AC_eas_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AF_oth: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AC_fin: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_amr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_amr: integer (nullable = true)
 |-- INFO_non_topmed_popmax: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_AF_eas_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN: integer (nullable = true)
 |-- INFO_nhomalt_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_nfe_est: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_afr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_afr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_oth_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_amr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_eas_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_FS: double (nullable = true)
 |-- INFO_non_topmed_faf99_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt_eas_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_amr_male: integer (nullable = true)
 |-- INFO_AN_nfe_nwe: integer (nullable = true)
 |-- INFO_non_topmed_faf95_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AF_oth: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AN_fin_male: integer (nullable = true)
 |-- INFO_non_topmed_AC_asj: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_rf_negative_label: boolean (nullable = true)
 |-- INFO_non_topmed_AC_raw: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_afr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_fin: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_nfe_male: integer (nullable = true)
 |-- INFO_controls_nhomalt_amr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_fin_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_nhomalt_asj: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_asj_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_afr_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_DP: integer (nullable = true)
 |-- INFO_non_neuro_AF_eas_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_afr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_gq_hist_all_bin_freq: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_controls_AN_afr_male: integer (nullable = true)
 |-- INFO_AN_nfe_seu: integer (nullable = true)
 |-- INFO_controls_AF_nfe_est: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AF_asj_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AC_fin_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_VQSLOD: double (nullable = true)
 |-- INFO_AF_fin: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN_nfe_male: integer (nullable = true)
 |-- INFO_AN_afr_female: integer (nullable = true)
 |-- INFO_non_topmed_AC_eas: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_fin: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_fin_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AC_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AC_fin: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_female: integer (nullable = true)
 |-- INFO_non_topmed_AC_eas_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_nfe_onf: integer (nullable = true)
 |-- INFO_non_topmed_nhomalt_nfe_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_oth_female: integer (nullable = true)
 |-- INFO_non_topmed_AN_eas: integer (nullable = true)
 |-- INFO_non_neuro_AN_nfe_seu: integer (nullable = true)
 |-- INFO_controls_nhomalt_amr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_ClippingRankSum: double (nullable = true)
 |-- INFO_faf99_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AF_oth: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AN_oth: integer (nullable = true)
 |-- INFO_AF_nfe_est: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_faf95_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_amr_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AN_raw: integer (nullable = true)
 |-- INFO_AN_afr_male: integer (nullable = true)
 |-- INFO_controls_faf99: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AN_eas_male: integer (nullable = true)
 |-- INFO_non_neuro_faf95_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt_eas_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_amr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_nfe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_asj: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_nfe_nwe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_faf99: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_nhomalt_nfe_nwe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_asj_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AF_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_faf95_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AC_oth: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_eas_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_amr_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_raw: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_popmax: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_controls_AF_popmax: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_afr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_nfe_seu: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_age_hist_het_bin_freq: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_non_topmed_AC_nfe_nwe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_VQSR_NEGATIVE_TRAIN_SITE: boolean (nullable = true)
 |-- INFO_nhomalt_asj_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_amr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_fin_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_eas_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_nfe_onf: integer (nullable = true)
 |-- INFO_non_neuro_AF_fin: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_oth_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_amr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_nfe_est: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_nfe_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AN_asj_female: integer (nullable = true)
 |-- INFO_non_neuro_AC_afr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_eas: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_oth_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AC_asj: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_faf95: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_faf99: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_nhomalt_afr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_eas_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_faf99_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_faf99_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt_nfe_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_asj_male: integer (nullable = true)
 |-- INFO_controls_nhomalt_nfe_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_afr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AF_fin_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_VQSR_POSITIVE_TRAIN_SITE: boolean (nullable = true)
 |-- INFO_controls_AF_fin_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_rf_label: string (nullable = true)
 |-- INFO_non_neuro_faf95_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_fin_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_nfe_est: integer (nullable = true)
 |-- INFO_non_topmed_AC_amr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_asj_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_BaseQRankSum: double (nullable = true)
 |-- INFO_non_neuro_faf99: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_nhomalt_oth_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_oth_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_lcr: boolean (nullable = true)
 |-- INFO_AN_asj: integer (nullable = true)
 |-- INFO_non_neuro_AC_amr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_afr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_fin_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AN_eas_female: integer (nullable = true)
 |-- INFO_non_neuro_AN_oth: integer (nullable = true)
 |-- INFO_AF_nfe_onf: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_asj_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AF: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AF_nfe_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_age_hist_het_n_larger: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_nfe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_afr_female: integer (nullable = true)
 |-- INFO_age_hist_hom_bin_freq: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_MQ: double (nullable = true)
 |-- INFO_AN_nfe_male: integer (nullable = true)
 |-- INFO_non_neuro_AC_nfe_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_faf99_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_nhomalt_raw: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_amr_female: integer (nullable = true)
 |-- INFO_controls_AF_asj: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AF_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AN_eas_female: integer (nullable = true)
 |-- INFO_QD: double (nullable = true)
 |-- INFO_non_topmed_AF_raw: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AC_fin_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_nfe_seu: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_asj_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_nfe_seu: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_oth: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_amr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_asj_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_nfe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AN_female: integer (nullable = true)
 |-- INFO_non_neuro_AN_asj: integer (nullable = true)
 |-- INFO_controls_AC_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_nfe_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_faf95_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AN_afr: integer (nullable = true)
 |-- INFO_dp_hist_alt_n_larger: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_nfe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_fin: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_oth_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_nhomalt_nfe_seu: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_transmitted_singleton: boolean (nullable = true)
 |-- INFO_non_topmed_AC_afr_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_oth: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_faf95_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AC_asj_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_eas_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_amr_female: integer (nullable = true)
 |-- INFO_non_topmed_nhomalt_nfe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AF_nfe_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_was_mixed: boolean (nullable = true)
 |-- INFO_AN_fin_female: integer (nullable = true)
 |-- INFO_non_topmed_nhomalt_amr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_asj_male: integer (nullable = true)
 |-- INFO_non_neuro_AF_fin_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AC_eas: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_nfe_seu: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AF_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_nhomalt_eas_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_faf95_afr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_asj_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_eas_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AC_afr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_fin: integer (nullable = true)
 |-- INFO_non_neuro_AN_asj_male: integer (nullable = true)
 |-- INFO_non_topmed_AF_nfe_seu: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_asj_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_n_alt_alleles: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_nfe_nwe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_nfe_nwe: integer (nullable = true)
 |-- INFO_controls_AC_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_asj_male: integer (nullable = true)
 |-- INFO_non_neuro_AN_afr_male: integer (nullable = true)
 |-- INFO_controls_AC_nfe_onf: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_amr: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_nfe_onf: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_female: integer (nullable = true)
 |-- INFO_non_neuro_nhomalt_asj_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_nfe: integer (nullable = true)
 |-- INFO_AF_nfe_seu: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_faf95: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_nfe_est: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_oth_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_amr_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_MQRankSum: double (nullable = true)
 |-- INFO_controls_AC_oth_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_nfe_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_nfe_onf: integer (nullable = true)
 |-- INFO_AF_afr_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_faf99_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_nhomalt_nfe: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_nhomalt_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF_eas_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AN_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_nfe_female: integer (nullable = true)
 |-- INFO_non_topmed_AC_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_nhomalt_oth: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AN_eas_female: integer (nullable = true)
 |-- INFO_AC_asj: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_nfe: integer (nullable = true)
 |-- INFO_non_topmed_AN_nfe_nwe: integer (nullable = true)
 |-- INFO_non_neuro_faf95: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AF_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_dp_hist_all_bin_freq: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_non_neuro_AN_popmax: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_nfe_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_oth: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_fin_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_nhomalt_eas: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_fin_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AN_amr: integer (nullable = true)
 |-- INFO_non_topmed_AC_nfe_est: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AC_afr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AF_raw: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_dp_hist_all_n_larger: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AC_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_popmax: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- INFO_controls_AC_eas_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_faf99_nfe: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AF_female: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_fin: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AN_male: integer (nullable = true)
 |-- INFO_non_topmed_AN_oth_female: integer (nullable = true)
 |-- INFO_AC_oth_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_oth: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_neuro_AF: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_AC_afr_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_fin_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AF_asj: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AN_eas_female: integer (nullable = true)
 |-- INFO_non_neuro_nhomalt_asj: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_non_topmed_AF_fin: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_topmed_AN_afr: integer (nullable = true)
 |-- INFO_controls_AF_amr_male: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AF_nfe_seu: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AC_oth_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_amr: integer (nullable = true)
 |-- INFO_non_topmed_AN_asj_female: integer (nullable = true)
 |-- INFO_AN_fin_male: integer (nullable = true)
 |-- INFO_non_topmed_AF_amr: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_non_neuro_AC_eas_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_nhomalt_nfe_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AC_oth_male: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_AC_nfe_female: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_faf95_eas: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- INFO_controls_AC_nfe_seu: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- INFO_controls_AN_raw: integer (nullable = true)
 |-- INFO_AN_oth_male: integer (nullable = true)
 |-- INFO_OLD_MULTIALLELIC: string (nullable = true)
 |-- genotypes: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- sampleId: string (nullable = true)

It's quite a hassle to convert every column to a scalar by hand.

Hoeze avatar Jul 04 '20 20:07 Hoeze

This applies especially to all info fields that are annotated with Number=A alias one value per alternate allele.

Maybe it would be easier to have all allele-specific columns in one large struct column?

root
 |-- contigName: string (nullable = true)
 |-- start: long (nullable = true)
 |-- end: long (nullable = true)
 |-- referenceAllele: string (nullable = true)
 |-- alternateAlleles: array (nullable = true) # this is the struct array that contains every allele-specific annotation
 |    |-- element: struct (containsNull = false)
 |    |    |-- alleleString: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- INFO_controls_AC_nfe_seu: integer (nullable = true)
[...]

Hoeze avatar Jul 04 '20 20:07 Hoeze

This is an interesting suggestion. This could make interoperability with VCF a bit awkward since the VCF header type would also change after splitting. cc @kianfar77 for thoughts.

@Hoeze, I'm curious you necessarily need to convert all the arrays to scalars. Is there a query that you can't write against the array types, or is it just more verbose? Btw, if you do need scalar types, you should be able to convert all array typed (or all Number=A typed) columns programmatically.

henrydavidge avatar Jul 06 '20 21:07 henrydavidge

Thanks for your answer @henrydavidge.

When we work with VCF's, we do so with one variant per row. Multiple alternate alleles at the same position is an edge case that we never used.

Therefore, until now I write a bunch of .withColumn("alternateAlleles", f.col("alternateAlleles")[0]) for every vcf. This leads to a number of problems:

  • When I do not look at the header, I am not sure if a column is really an array of length num_alt_alleles
  • It requires to write another set of column casts for each VCF
  • Struct of equal-length arrays is a very bad representation if I want to work with alternative alleles. For example, if I want to filter for variant quality, I have to subset every single array by the result of the filter expression.
  • Combined explosion of the alt_allele dimension is not possible as well. I first need to zip all the equal-length arrays into one struct.

In comparison, with Array[Struct{<alt_allele annotations>}] you have the following guarantees:

  • All per-alt-allele annotation is collected in a single entry of the array
  • No confusion about which columns are really per-allele and which are not
  • You can directly filter the array for certain alternative alleles
  • You get split_multiallelic for free by exploding the array

The data type also does not change. On the contrary, it becomes even more explicit: All columns in alternateAlleles can be assumed to be of type Number=A.


Thinking about this, I get more and more convinced that this representation would significantly improve the workflow with Glow.

Hoeze avatar Jul 06 '20 23:07 Hoeze

@Hoeze Thanks for raising this issue. @Hoeze @henrydavidge I think this is more a discussion of our Variant Schema. That is whether to have alternateAlleles and INFO fields merged as an Array of Structs for those INFO fields that are Number=A in our variant schema? I think this is double-sided. Burying integers and strings in Array of Structs instead of having them in simple arrays brings its own awkwardness. We currently do not check for countType of the variant when reading a VCF to separate alternate-alleles-specific arrays from other arrays in any sense so we do not know whether the array is of that type or not. In split_multiallelics transformer I do this solely based on the number of elements in the array (if equal to A split, if not repeat the whole array). I think if we figure a way to tag the info fields that are alternate-allele-specific, the rest can be done by zipping programmatically. @henrydavidge Can StructField metadata be used for this?

@Hoeze in any case, split_multiallelics will be more complicated than exploding arrays as its main job is handling revision of calls and colex-ordered fields in the genotypes column .

kianfar77 avatar Jul 07 '20 01:07 kianfar77