hail icon indicating copy to clipboard operation
hail copied to clipboard

[query/vds] Actually use `ref_block_max_length` in `to_dense_mt`

Open chrisvittal opened this issue 2 months ago • 0 comments

Right now, we perform a full scan in to_dense_mt, we have information to do less work and densify in a single pass.

  • Expose partitioning in python
  • For each partition in the variants table, use ref_block_max_length to determine the full reference interval necessary to densify that partition
  • Use map_partitions of the variants and query_table on the reference to get two streams with all information necessary to densify.
  • Join the streams and use the current algorithm/scan to do the work.

chrisvittal avatar Apr 23 '24 19:04 chrisvittal