gnomad_methods icon indicating copy to clipboard operation
gnomad_methods copied to clipboard

Document or update split_vds_by_strata to more accurately reflect behavior

Open mike-w-wilson opened this issue 11 months ago • 0 comments

Working on v4.0, we created the gnomad_methods function split_vds_by_strata which splits a vds based on a n expression. The desired behavior was to split a vds and maintain all alleles in each subset. This does not happen as it utilizes hail's vds.filter_samples function which unexpectedly removes all variants that are not present in a filtered sample subset despite keeping the arg remove_dead_alleles as false.

As it stands, our function does not state it will maintain or remove the dead alleles simply it will split the vds. However, we should consider updating the function so removing or keeping the dead alleles/variants is an option and it is documented.

mike-w-wilson avatar Mar 07 '24 13:03 mike-w-wilson