Jerome Kelleher

Results 232 issues of Jerome Kelleher

It's useful to have a summary of the average arity of nodes in the trees. Here's a function that computes it: ```python def nodes_mean_arity(ts): weighted_arity = np.zeros(ts.num_nodes) span_in_tree = np.zeros(ts.num_nodes)...

enhancement
Python API

These should be seen as analogous to the ``start`` and ``stop`` arguments to Python's builtin ``range``, and be specified in tree sequence coordinates. We should initially ensure that start <...

enhancement
C API
Python API

A useful extension to the ``keep_rows`` functions added in #2700 would be a way to remap all the node ID values in a TableCollection using a given id_map (i.e., the...

enhancement
C API

It has popped up in a few places that our laxity in not checking the correctness of mutation parents at TreeSequence initialisation was a mistake, and we really should be...

enhancement

We have recently learned that there was an error in the description of the Gutenkunst et al model provided as an example in the msprime tutorial. It appears that you...

With multiple contigs, Tabix indexed VCFs sometimes return regions with an end coordinate of 0 (which is illegal). See here for a test case: https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/tests/test_vcf_utils.py#L135 The fix is pretty easy...

bug

Because BCF CSI indexes store information for all contigs listed in the header, we need to filter out regions that have a zero counts like here: https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/bio2zarr/vcf_utils.py#L510 While returning empty...

bug

The index names for CSI indexed VCFs must be derived from the index itself, because sequence names in an indexed VCF refer to *observed* sequences, not those that are listed...

bug

There's no good reason for returning bytes rather than utf8 unicode strings I think --- it can only lead to bugs in user code and inconsistencies in string handling (anyone...

bug

Updates for vcf-zarr spec change: https://github.com/pystatgen/vcf-zarr-spec/issues/14 Almost working, I think the main issues remaining are in the round-trip tests. Nobody uses Character columns in the real world I think, so...