tskit Refactor tests to use example tree sequences and parametrize

We currently have a mishmash of different ways of running tests on a range of different topologies. I think a good pattern will be to use pytest.mark.parametrize with a selection of tree sequences that we create once. The get_example_tree_sequences list in tests/test_highlevel.py is a good place to start for this (and currently used in existing PRs like #1704).

We don't have to do it all at once, but it would be good to refactor a few of the places we use different topologies and set the standard pattern for how we'd to see things done in the future. To close this PR, we should:

Extract the get_example_tree_sequences() code from tests/test_highlevel.py into it's own module (maybe examples.py?) and set it up so that the list of examples gets built the first time its called, and then cached afterwards.
Refactor the test_highlevel.py file to use this function instead of the local examples and fix any breakages (nice to do a bit more "parametrizing" there as well where possible instead of iterating over the examples within the function.
Port the existing "fixture" methods to use the same pattern.

I think pytest fixtures are an anti-pattern because you're defining a global variable that the reader is supposed to understand the meaning of, and know where to find the definition. This approach is much more transparent.

So, code like this should be canonical in test files going forward:


from tests import examples

...

class TestSomeStuff:

     @pytest.mark.parametrize("ts", examples.standard_ts())
     def test_specific_thing_where_topology_doesnt_matter(self, ts):
            ...

     @pytest.mark.parametrize("ts", examples.sample_topologies())
     def test_specific_thing_where_topology_matters_and_we_want_a_variety(self, ts):
            ...

Oct 17 '21 10:10 jeromekelleher

A nice side effect of this is that'll give us the option of tweaking how extensive the test suite is - we can add some pytest parameters that'll let us control how many different examples we go through in the various lists. So, when adding a new feature that you really want to throw the kitchen sink at, have lots of replicates but usually we keep the number of samples quite small and selective so the suite runs more quickly.

Oct 18 '21 08:10 jeromekelleher

Sounds great, I think it is worth keeping the fixture that we have which is a tree sequence where all fields have data (ts_fixture) as when the tests don't depend on topology this is all you need.

Oct 18 '21 10:10 benjeffery

I agree with should keep it, but I'd prefer to call it examples.complete_ts() or something. Mysterious global variables that you have to pull out of the pytest config just seems like an unnecessarily opaque way of doing things --- this is much more discoverable for a new person coming in to the project who doesn't know much about pytest.

Oct 18 '21 11:10 jeromekelleher

We should add some arguments here to filter the list for things we're interested in:

discrete_genome (True, just discrete, False just continuous, None, both)
single_root

I guess we also want a few different types of sites on these.

Oct 27 '21 12:10 jeromekelleher

Some more things to make sure we hit (maybe with variables maybe not):

infinite_alleles
non-single-character alleles
has_mutations
missing_data
polytomies
ancestral_samples
metadata everywhere
discrete_times
sites with no mutations, individuals with no nodes, populations with no nodes
unsorted tables to the extent allowed by the requirements (eg shuffle up the nodes)
unary nodes

Most of these could be hit by a few WF simulations on a small genome.

Oct 28 '21 15:10 petrelharp

tskit tskit copied to clipboard

Refactor tests to use example tree sequences and parametrize

tskit
tskit copied to clipboard