hail icon indicating copy to clipboard operation
hail copied to clipboard

[query] `n_partitions()` after `read_matrix_table(..., _n_partitions=...)` produces one fewer partitions than expected

Open danking opened this issue 7 months ago • 0 comments

What happened?

This dataset has 1000 rows so it should be able to have as many as 1000 partitions. As seen below, either n_partitions is wrong or read_matrix_table(..., _n_partitions=...) is producing too few partitions.

In [2]: hl.balding_nichols_model(1, 1000, 1000, n_partitions=100).write('/tmp/foo.mt', overwrite=True)
2023-12-01 11:23:11.757 Hail: INFO: balding_nichols_model: generating genotypes for 1 populations, 1000 samples, and 1000 variants...
2023-12-01 11:23:27.947 Hail: INFO: wrote matrix table with 1000 rows and 1000 columns in 100 partitions to /tmp/foo.mt

In [3]: hl.read_matrix_table('/tmp/foo.mt', _n_partitions=1000).n_partitions()
Out[3]: 999

In [4]: hl.read_matrix_table('/tmp/foo.mt', _n_partitions=5000).n_partitions()
Out[4]: 999

In [5]: hl.read_matrix_table('/tmp/foo.mt', _n_partitions=900).n_partitions()
Out[5]: 500

This issue is complete when:

  • [ ] The above session outputs 1000, 1000, 500.
  • [ ] There are tests for read matrix table and read table with n_partitions set to numbers near the number of rows, as well as many fewer and many more.
  • [ ] There are tests for passing _n_partitions of 0 and 1.

Version

0.2.126

Relevant log output

No response

danking avatar Dec 01 '23 16:12 danking