Jerome Kelleher
Jerome Kelleher
It's useful to have a summary of the average arity of nodes in the trees. Here's a function that computes it: ```python def nodes_mean_arity(ts): weighted_arity = np.zeros(ts.num_nodes) span_in_tree = np.zeros(ts.num_nodes)...
These should be seen as analogous to the ``start`` and ``stop`` arguments to Python's builtin ``range``, and be specified in tree sequence coordinates. We should initially ensure that start <...
A useful extension to the ``keep_rows`` functions added in #2700 would be a way to remap all the node ID values in a TableCollection using a given id_map (i.e., the...
It has popped up in a few places that our laxity in not checking the correctness of mutation parents at TreeSequence initialisation was a mistake, and we really should be...
We have recently learned that there was an error in the description of the Gutenkunst et al model provided as an example in the msprime tutorial. It appears that you...
With multiple contigs, Tabix indexed VCFs sometimes return regions with an end coordinate of 0 (which is illegal). See here for a test case: https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/tests/test_vcf_utils.py#L135 The fix is pretty easy...
Because BCF CSI indexes store information for all contigs listed in the header, we need to filter out regions that have a zero counts like here: https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/bio2zarr/vcf_utils.py#L510 While returning empty...
The index names for CSI indexed VCFs must be derived from the index itself, because sequence names in an indexed VCF refer to *observed* sequences, not those that are listed...
There's no good reason for returning bytes rather than utf8 unicode strings I think --- it can only lead to bugs in user code and inconsistencies in string handling (anyone...
Updates for vcf-zarr spec change: https://github.com/pystatgen/vcf-zarr-spec/issues/14 Almost working, I think the main issues remaining are in the round-trip tests. Nobody uses Character columns in the real world I think, so...