tskit Add node_is_sample array

Add node_is_sample array

Open jeromekelleher opened this issue 1 year ago • 2 comments

This just took me way to long to do:

def node_is_sample(ts):
    sample_flag = np.full_like(ts.nodes_flags, tskit.NODE_IS_SAMPLE)
    return np.bitwise_and(ts.nodes_flags, sample_flag) != 0

Tests:


class TestNodeIsSample:
    def test_simple_example(self):
        ts = single_tree_example_ts()
        is_sample = model.node_is_sample(ts)
        for node in ts.nodes():
            assert node.is_sample() == is_sample[node.id]

    @pytest.mark.parametrize("bit", [1, 2, 17, 31])
    def test_sample_and_other_flags(self, bit):
         tables = single_tree_example_ts().dump_tables()
        flags = tables.nodes.flags
        tables.nodes.flags = flags | (1 << bit)
        ts = tables.tree_sequence()
        is_sample = model.node_is_sample(ts)
        for node in ts.nodes():
            assert node.is_sample() == is_sample[node.id]
            assert (node.flags & (1 << bit)) != 0

I'm not sure how it should be named etc, but we do want this array in some form available in the library. I guess it would be useful to have it take an array, so that it can be applied to the node table too, like

def is_sample(flags):
     sample_flag = np.full_like(flags, tskit.NODE_IS_SAMPLE)
     return np.bitwise_and(flags, sample_flag) != 0

Apr 04 '24 09:04 jeromekelleher

Isn't this just (ts.nodes_flags & tskit.NODE_IS_SAMPLE).astype(bool). No need to explicitly make an array of all the same number? It could be a method or calculated property too, but maybe that's not useful for Numba.

Jul 23 '24 12:07 hyanwong

Yeah, your way is better.

Jul 23 '24 13:07 jeromekelleher

tskit tskit copied to clipboard

Add node_is_sample array

tskit
tskit copied to clipboard