hail
hail copied to clipboard
[query/vds] Actually use `ref_block_max_length` in `to_dense_mt`
Right now, we perform a full scan in to_dense_mt
, we have information to do less work and densify in a single pass.
- Expose partitioning in python
- For each partition in the variants table, use
ref_block_max_length
to determine the full reference interval necessary to densify that partition - Use
map_partitions
of the variants andquery_table
on the reference to get two streams with all information necessary to densify. - Join the streams and use the current algorithm/scan to do the work.