glow
glow copied to clipboard
split_multiallelic should convert arrays to scalars
Hi, would it be possible to have split_multiallelic
also adjusting the table schema?
I know this implies some work to also adjust every other method that assumes arrays due to multi-allelic property. However, this would be very helpful to directly read VCF files in the correct schema.
For example, when I read the gnomad 2.1.1 VCF I get the following schema:
root
|-- contigName: string (nullable = true)
|-- start: long (nullable = true)
|-- end: long (nullable = true)
|-- names: array (nullable = true)
| |-- element: string (containsNull = true)
|-- referenceAllele: string (nullable = true)
|-- alternateAlleles: array (nullable = false)
| |-- element: string (containsNull = true)
|-- qual: double (nullable = true)
|-- filters: array (nullable = true)
| |-- element: string (containsNull = true)
|-- splitFromMultiAllelic: boolean (nullable = false)
|-- INFO_non_neuro_AC_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_oth_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_nfe_female: integer (nullable = true)
|-- INFO_controls_AN_amr_female: integer (nullable = true)
|-- INFO_non_topmed_AN_fin_female: integer (nullable = true)
|-- INFO_controls_AN_eas_male: integer (nullable = true)
|-- INFO_controls_AN_nfe_onf: integer (nullable = true)
|-- INFO_rf_positive_label: boolean (nullable = true)
|-- INFO_controls_AN_fin: integer (nullable = true)
|-- INFO_non_neuro_nhomalt_eas_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_nfe_est: integer (nullable = true)
|-- INFO_variant_type: string (nullable = true)
|-- INFO_controls_AN_eas: integer (nullable = true)
|-- INFO_AF_oth_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AC_asj_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_amr_male: integer (nullable = true)
|-- INFO_nhomalt_asj: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_nfe_nwe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_nhomalt_asj_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_oth: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_asj_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AN_nfe_seu: integer (nullable = true)
|-- INFO_controls_nhomalt_afr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_faf95: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_nhomalt_amr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_afr_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_nhomalt_nfe_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_afr_male: integer (nullable = true)
|-- INFO_non_neuro_AN_fin_female: integer (nullable = true)
|-- INFO_controls_AF_amr_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_amr_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_amr_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_amr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_afr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_nfe_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_fin: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_ab_hist_alt_bin_freq: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_AN_raw: integer (nullable = true)
|-- INFO_faf95_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AF: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_nhomalt_oth: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_pab_max: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_nfe_onf: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_eas: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AF_fin_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt_asj_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_amr_female: integer (nullable = true)
|-- INFO_non_neuro_faf99_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_oth_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_asj: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_fin: integer (nullable = true)
|-- INFO_AC_nfe_onf: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_amr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_raw: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_nfe_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_asj_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_eas_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_dp_hist_alt_bin_freq: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_AN_male: integer (nullable = true)
|-- INFO_nhomalt_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_afr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_nfe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_eas_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_eas_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_amr: integer (nullable = true)
|-- INFO_AC_afr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_gq_hist_alt_bin_freq: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_non_topmed_AC_oth: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AF_asj_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_VQSR_culprit: string (nullable = true)
|-- INFO_non_topmed_AC_fin_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_nfe_nwe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_nfe: integer (nullable = true)
|-- INFO_vep: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_non_topmed_AF_nfe_est: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_faf95_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_segdup: boolean (nullable = true)
|-- INFO_allele_type: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_non_neuro_AF_nfe_nwe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt_amr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AF_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_asj_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_nfe_female: integer (nullable = true)
|-- INFO_non_topmed_AN_oth_male: integer (nullable = true)
|-- INFO_non_topmed_AC_nfe_onf: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_fin_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_raw: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_afr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_raw: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_nhomalt_amr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt_raw: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_afr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_afr_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN_nfe_est: integer (nullable = true)
|-- INFO_non_topmed_nhomalt_nfe_onf: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_eas_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AC_raw: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_fin_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_nfe_nwe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_oth_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_eas: integer (nullable = true)
|-- INFO_non_topmed_AN_male: integer (nullable = true)
|-- INFO_non_topmed_nhomalt_nfe_seu: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_eas_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_faf95_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_SOR: double (nullable = true)
|-- INFO_controls_AC: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_afr: integer (nullable = true)
|-- INFO_controls_AN_asj: integer (nullable = true)
|-- INFO_non_topmed_AF_popmax: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AN_eas: integer (nullable = true)
|-- INFO_controls_AN_male: integer (nullable = true)
|-- INFO_non_neuro_AN_asj_female: integer (nullable = true)
|-- INFO_controls_AN_amr_male: integer (nullable = true)
|-- INFO_non_topmed_AC_nfe_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_nfe_est: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_age_hist_het_n_smaller: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_nfe_onf: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_nhomalt_amr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_eas_male: integer (nullable = true)
|-- INFO_non_topmed_AN_nfe_male: integer (nullable = true)
|-- INFO_non_neuro_AF_nfe_seu: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_faf95_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AF_nfe_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN_oth_male: integer (nullable = true)
|-- INFO_AF_nfe_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_eas: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_eas: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_fin_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_oth_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_oth_male: integer (nullable = true)
|-- INFO_non_neuro_AN_nfe_nwe: integer (nullable = true)
|-- INFO_controls_nhomalt_nfe_nwe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_nfe_seu: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_amr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_fin_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_nfe_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt_nfe_est: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_faf99_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_nfe_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_nfe_onf: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_fin_male: integer (nullable = true)
|-- INFO_non_topmed_AF_asj_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_nfe_nwe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_ReadPosRankSum: double (nullable = true)
|-- INFO_non_topmed_AF_oth_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AC_amr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_fin: integer (nullable = true)
|-- INFO_controls_nhomalt_oth_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_nfe_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN: integer (nullable = true)
|-- INFO_AC_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_afr_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_afr_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_nfe_est: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AC_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_fin_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AF_oth_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AN_female: integer (nullable = true)
|-- INFO_non_neuro_AN_eas_male: integer (nullable = true)
|-- INFO_nhomalt_asj_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_oth: integer (nullable = true)
|-- INFO_AF_asj: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN_afr_female: integer (nullable = true)
|-- INFO_non_topmed_faf99_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AF_afr_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN_oth_female: integer (nullable = true)
|-- INFO_non_topmed_nhomalt_eas_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_oth_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_nhomalt_nfe_est: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_popmax: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_has_star: boolean (nullable = true)
|-- INFO_non_neuro_AC_nfe_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_afr: integer (nullable = true)
|-- INFO_non_topmed_nhomalt_fin_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_fin: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_faf99_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_rf_train: boolean (nullable = true)
|-- INFO_controls_AN_oth: integer (nullable = true)
|-- INFO_nhomalt_oth_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_fin_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_nfe_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_nhomalt_afr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_nonpar: boolean (nullable = true)
|-- INFO_decoy: boolean (nullable = true)
|-- INFO_AF_nfe_nwe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AC_nfe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_oth_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN_fin_female: integer (nullable = true)
|-- INFO_AC_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_afr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_asj_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_eas_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_nfe_nwe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_popmax: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AN_nfe_est: integer (nullable = true)
|-- INFO_non_neuro_AF_nfe_onf: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_nhomalt_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_amr_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_faf95_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AN_asj: integer (nullable = true)
|-- INFO_age_hist_hom_n_larger: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_fin: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_faf99_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AF_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_oth_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_nfe: integer (nullable = true)
|-- INFO_non_neuro_faf99_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AF_nfe_onf: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN_nfe_seu: integer (nullable = true)
|-- INFO_non_neuro_nhomalt_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_asj_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_oth_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_fin_male: integer (nullable = true)
|-- INFO_non_topmed_AF_afr_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_asj_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_faf99_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_InbreedingCoeff: double (nullable = true)
|-- INFO_controls_AF_nfe_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AN_afr_female: integer (nullable = true)
|-- INFO_age_hist_hom_n_smaller: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_raw: integer (nullable = true)
|-- INFO_nhomalt_nfe_onf: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_popmax: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_faf99_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AN_nfe_female: integer (nullable = true)
|-- INFO_nhomalt_nfe_est: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_raw: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_faf95_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_raw: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_asj: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_eas: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN: integer (nullable = true)
|-- INFO_controls_AC_amr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_fin_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_amr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_faf95_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN_asj_female: integer (nullable = true)
|-- INFO_controls_nhomalt_afr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_amr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_faf99_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_amr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_oth_female: integer (nullable = true)
|-- INFO_nhomalt_fin_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_afr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN: integer (nullable = true)
|-- INFO_controls_nhomalt_afr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_rf_tp_probability: double (nullable = true)
|-- INFO_non_neuro_AN_amr_male: integer (nullable = true)
|-- INFO_AC_eas_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AF_oth: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AC_fin: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_amr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_amr: integer (nullable = true)
|-- INFO_non_topmed_popmax: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_AF_eas_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN: integer (nullable = true)
|-- INFO_nhomalt_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_nfe_est: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_afr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_afr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_oth_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_amr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_eas_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_FS: double (nullable = true)
|-- INFO_non_topmed_faf99_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt_eas_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_amr_male: integer (nullable = true)
|-- INFO_AN_nfe_nwe: integer (nullable = true)
|-- INFO_non_topmed_faf95_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AF_oth: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AN_fin_male: integer (nullable = true)
|-- INFO_non_topmed_AC_asj: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_rf_negative_label: boolean (nullable = true)
|-- INFO_non_topmed_AC_raw: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_afr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_fin: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_nfe_male: integer (nullable = true)
|-- INFO_controls_nhomalt_amr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_fin_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_nhomalt_asj: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_asj_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_afr_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_DP: integer (nullable = true)
|-- INFO_non_neuro_AF_eas_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_afr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_gq_hist_all_bin_freq: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_controls_AN_afr_male: integer (nullable = true)
|-- INFO_AN_nfe_seu: integer (nullable = true)
|-- INFO_controls_AF_nfe_est: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AF_asj_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AC_fin_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_VQSLOD: double (nullable = true)
|-- INFO_AF_fin: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN_nfe_male: integer (nullable = true)
|-- INFO_AN_afr_female: integer (nullable = true)
|-- INFO_non_topmed_AC_eas: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_fin: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_fin_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AC_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AC_fin: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_female: integer (nullable = true)
|-- INFO_non_topmed_AC_eas_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_nfe_onf: integer (nullable = true)
|-- INFO_non_topmed_nhomalt_nfe_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_oth_female: integer (nullable = true)
|-- INFO_non_topmed_AN_eas: integer (nullable = true)
|-- INFO_non_neuro_AN_nfe_seu: integer (nullable = true)
|-- INFO_controls_nhomalt_amr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_ClippingRankSum: double (nullable = true)
|-- INFO_faf99_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AF_oth: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AN_oth: integer (nullable = true)
|-- INFO_AF_nfe_est: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_faf95_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_amr_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AN_raw: integer (nullable = true)
|-- INFO_AN_afr_male: integer (nullable = true)
|-- INFO_controls_faf99: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AN_eas_male: integer (nullable = true)
|-- INFO_non_neuro_faf95_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt_eas_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_amr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_nfe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_asj: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_nfe_nwe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_faf99: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_nhomalt_nfe_nwe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_asj_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AF_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_faf95_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AC_oth: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_eas_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_amr_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_raw: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_popmax: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_controls_AF_popmax: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_afr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_nfe_seu: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_age_hist_het_bin_freq: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_non_topmed_AC_nfe_nwe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_VQSR_NEGATIVE_TRAIN_SITE: boolean (nullable = true)
|-- INFO_nhomalt_asj_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_amr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_fin_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_eas_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_nfe_onf: integer (nullable = true)
|-- INFO_non_neuro_AF_fin: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_oth_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_amr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_nfe_est: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_nfe_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AN_asj_female: integer (nullable = true)
|-- INFO_non_neuro_AC_afr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_eas: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_oth_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AC_asj: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_faf95: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_faf99: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_nhomalt_afr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_eas_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_faf99_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_faf99_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt_nfe_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_asj_male: integer (nullable = true)
|-- INFO_controls_nhomalt_nfe_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_afr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AF_fin_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_VQSR_POSITIVE_TRAIN_SITE: boolean (nullable = true)
|-- INFO_controls_AF_fin_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_rf_label: string (nullable = true)
|-- INFO_non_neuro_faf95_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_fin_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_nfe_est: integer (nullable = true)
|-- INFO_non_topmed_AC_amr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_asj_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_BaseQRankSum: double (nullable = true)
|-- INFO_non_neuro_faf99: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_nhomalt_oth_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_oth_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_lcr: boolean (nullable = true)
|-- INFO_AN_asj: integer (nullable = true)
|-- INFO_non_neuro_AC_amr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_afr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_fin_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AN_eas_female: integer (nullable = true)
|-- INFO_non_neuro_AN_oth: integer (nullable = true)
|-- INFO_AF_nfe_onf: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_asj_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AF: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AF_nfe_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_age_hist_het_n_larger: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_nfe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_afr_female: integer (nullable = true)
|-- INFO_age_hist_hom_bin_freq: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_MQ: double (nullable = true)
|-- INFO_AN_nfe_male: integer (nullable = true)
|-- INFO_non_neuro_AC_nfe_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_faf99_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_nhomalt_raw: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_amr_female: integer (nullable = true)
|-- INFO_controls_AF_asj: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AF_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AN_eas_female: integer (nullable = true)
|-- INFO_QD: double (nullable = true)
|-- INFO_non_topmed_AF_raw: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AC_fin_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_nfe_seu: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_asj_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_nfe_seu: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_oth: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_amr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_asj_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_nfe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AN_female: integer (nullable = true)
|-- INFO_non_neuro_AN_asj: integer (nullable = true)
|-- INFO_controls_AC_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_nfe_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_faf95_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AN_afr: integer (nullable = true)
|-- INFO_dp_hist_alt_n_larger: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_nfe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_fin: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_oth_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_nhomalt_nfe_seu: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_transmitted_singleton: boolean (nullable = true)
|-- INFO_non_topmed_AC_afr_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_oth: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_faf95_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AC_asj_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_eas_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_amr_female: integer (nullable = true)
|-- INFO_non_topmed_nhomalt_nfe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AF_nfe_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_was_mixed: boolean (nullable = true)
|-- INFO_AN_fin_female: integer (nullable = true)
|-- INFO_non_topmed_nhomalt_amr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_asj_male: integer (nullable = true)
|-- INFO_non_neuro_AF_fin_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AC_eas: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_nfe_seu: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AF_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_nhomalt_eas_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_faf95_afr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_asj_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_eas_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AC_afr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_fin: integer (nullable = true)
|-- INFO_non_neuro_AN_asj_male: integer (nullable = true)
|-- INFO_non_topmed_AF_nfe_seu: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_asj_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_n_alt_alleles: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_nfe_nwe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_nfe_nwe: integer (nullable = true)
|-- INFO_controls_AC_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_asj_male: integer (nullable = true)
|-- INFO_non_neuro_AN_afr_male: integer (nullable = true)
|-- INFO_controls_AC_nfe_onf: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_amr: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_nfe_onf: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_female: integer (nullable = true)
|-- INFO_non_neuro_nhomalt_asj_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_nfe: integer (nullable = true)
|-- INFO_AF_nfe_seu: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_faf95: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_nfe_est: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_oth_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_amr_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_MQRankSum: double (nullable = true)
|-- INFO_controls_AC_oth_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_nfe_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_nfe_onf: integer (nullable = true)
|-- INFO_AF_afr_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_faf99_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_nhomalt_nfe: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_nhomalt_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF_eas_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AN_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_nfe_female: integer (nullable = true)
|-- INFO_non_topmed_AC_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_nhomalt_oth: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AN_eas_female: integer (nullable = true)
|-- INFO_AC_asj: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_nfe: integer (nullable = true)
|-- INFO_non_topmed_AN_nfe_nwe: integer (nullable = true)
|-- INFO_non_neuro_faf95: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AF_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_dp_hist_all_bin_freq: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_non_neuro_AN_popmax: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_nfe_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_oth: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_fin_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_nhomalt_eas: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_fin_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AN_amr: integer (nullable = true)
|-- INFO_non_topmed_AC_nfe_est: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AC_afr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AF_raw: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_dp_hist_all_n_larger: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AC_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_popmax: array (nullable = true)
| |-- element: string (containsNull = true)
|-- INFO_controls_AC_eas_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_faf99_nfe: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AF_female: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_fin: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AN_male: integer (nullable = true)
|-- INFO_non_topmed_AN_oth_female: integer (nullable = true)
|-- INFO_AC_oth_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_oth: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_neuro_AF: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_AC_afr_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_fin_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AF_asj: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AN_eas_female: integer (nullable = true)
|-- INFO_non_neuro_nhomalt_asj: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_non_topmed_AF_fin: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_topmed_AN_afr: integer (nullable = true)
|-- INFO_controls_AF_amr_male: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AF_nfe_seu: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AC_oth_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_amr: integer (nullable = true)
|-- INFO_non_topmed_AN_asj_female: integer (nullable = true)
|-- INFO_AN_fin_male: integer (nullable = true)
|-- INFO_non_topmed_AF_amr: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_non_neuro_AC_eas_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_nhomalt_nfe_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AC_oth_male: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_AC_nfe_female: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_faf95_eas: array (nullable = true)
| |-- element: double (containsNull = true)
|-- INFO_controls_AC_nfe_seu: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- INFO_controls_AN_raw: integer (nullable = true)
|-- INFO_AN_oth_male: integer (nullable = true)
|-- INFO_OLD_MULTIALLELIC: string (nullable = true)
|-- genotypes: array (nullable = true)
| |-- element: struct (containsNull = false)
| | |-- sampleId: string (nullable = true)
It's quite a hassle to convert every column to a scalar by hand.
This applies especially to all info fields that are annotated with Number=A
alias one value per alternate allele
.
Maybe it would be easier to have all allele-specific columns in one large struct column?
root
|-- contigName: string (nullable = true)
|-- start: long (nullable = true)
|-- end: long (nullable = true)
|-- referenceAllele: string (nullable = true)
|-- alternateAlleles: array (nullable = true) # this is the struct array that contains every allele-specific annotation
| |-- element: struct (containsNull = false)
| | |-- alleleString: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- INFO_controls_AC_nfe_seu: integer (nullable = true)
[...]
This is an interesting suggestion. This could make interoperability with VCF a bit awkward since the VCF header type would also change after splitting. cc @kianfar77 for thoughts.
@Hoeze, I'm curious you necessarily need to convert all the arrays to scalars. Is there a query that you can't write against the array types, or is it just more verbose? Btw, if you do need scalar types, you should be able to convert all array typed (or all Number=A
typed) columns programmatically.
Thanks for your answer @henrydavidge.
When we work with VCF's, we do so with one variant per row. Multiple alternate alleles at the same position is an edge case that we never used.
Therefore, until now I write a bunch of .withColumn("alternateAlleles", f.col("alternateAlleles")[0])
for every vcf.
This leads to a number of problems:
- When I do not look at the header, I am not sure if a column is really an array of length
num_alt_alleles
- It requires to write another set of column casts for each VCF
- Struct of equal-length arrays is a very bad representation if I want to work with alternative alleles. For example, if I want to filter for variant quality, I have to subset every single array by the result of the filter expression.
- Combined explosion of the
alt_allele
dimension is not possible as well. I first need to zip all the equal-length arrays into one struct.
In comparison, with Array[Struct{<alt_allele annotations>}]
you have the following guarantees:
- All per-alt-allele annotation is collected in a single entry of the array
- No confusion about which columns are really per-allele and which are not
- You can directly filter the array for certain alternative alleles
- You get
split_multiallelic
for free by exploding the array
The data type also does not change. On the contrary, it becomes even more explicit:
All columns in alternateAlleles
can be assumed to be of type Number=A
.
Thinking about this, I get more and more convinced that this representation would significantly improve the workflow with Glow.
@Hoeze Thanks for raising this issue. @Hoeze @henrydavidge I think this is more a discussion of our Variant Schema. That is whether to have alternateAlleles and INFO fields merged as an Array of Structs for those INFO fields that are Number=A
in our variant schema? I think this is double-sided. Burying integers and strings in Array of Structs instead of having them in simple arrays brings its own awkwardness. We currently do not check for countType of the variant when reading a VCF to separate alternate-alleles-specific arrays from other arrays in any sense so we do not know whether the array is of that type or not. In split_multiallelics
transformer I do this solely based on the number of elements in the array (if equal to A split, if not repeat the whole array). I think if we figure a way to tag the info fields that are alternate-allele-specific, the rest can be done by zipping programmatically. @henrydavidge Can StructField metadata be used for this?
@Hoeze in any case, split_multiallelics
will be more complicated than exploding arrays as its main job is handling revision of calls and colex-ordered fields in the genotypes column .