hail
hail copied to clipboard
[query] `n_partitions()` after `read_matrix_table(..., _n_partitions=...)` produces one fewer partitions than expected
What happened?
This dataset has 1000 rows so it should be able to have as many as 1000 partitions. As seen below, either n_partitions
is wrong or read_matrix_table(..., _n_partitions=...)
is producing too few partitions.
In [2]: hl.balding_nichols_model(1, 1000, 1000, n_partitions=100).write('/tmp/foo.mt', overwrite=True)
2023-12-01 11:23:11.757 Hail: INFO: balding_nichols_model: generating genotypes for 1 populations, 1000 samples, and 1000 variants...
2023-12-01 11:23:27.947 Hail: INFO: wrote matrix table with 1000 rows and 1000 columns in 100 partitions to /tmp/foo.mt
In [3]: hl.read_matrix_table('/tmp/foo.mt', _n_partitions=1000).n_partitions()
Out[3]: 999
In [4]: hl.read_matrix_table('/tmp/foo.mt', _n_partitions=5000).n_partitions()
Out[4]: 999
In [5]: hl.read_matrix_table('/tmp/foo.mt', _n_partitions=900).n_partitions()
Out[5]: 500
This issue is complete when:
- [ ] The above session outputs 1000, 1000, 500.
- [ ] There are tests for read matrix table and read table with n_partitions set to numbers near the number of rows, as well as many fewer and many more.
- [ ] There are tests for passing _n_partitions of 0 and 1.
Version
0.2.126
Relevant log output
No response