tskit
tskit copied to clipboard
Implement state machine for tsk_variant_t and keep reference to ``tree_sequence`` in restricted_copy.
We are currently using the tree_sequence attribute of a way of determining whether a variant is a frozen copy or not. We also use the variant->site.position attribute as a way of determining if the variant has been decoded. It would be simpler if we had a single state machine, which supported transitions:
VARIANT_STATE_NEW -> VARIANT_STATE_DECODED
VARIANT_STATE_DECODED -> VARIANT_STATE_DECODED
VARIANT_STATE_DECODED -> VARIANT_STATE_FROZEN_COPY
VARIANT_STATE_FROZEN_COPY -> VARIANT_STATE_FROZEN_COPY
Thus,
- If state == VARIANT_STATE_NEW then
tsk_variant_restricted_copyshould fail - if state == VARIANT_STATE_FROZEN_COPY then
tsk_variant_decodeshould fail
The current approach of using the tree_sequence is problematic because
- We're documenting this attribute and are not documenting that it is currently NULL for frozen copies
- We're still referring to memory from the tree sequence from the initial variant through the
sitecopy (e.g., the list of mutations is still pointing to memory from the original ts). Thus, we still have a dependency on the original ts. Note, we're currently getting away with this dependence in the Python C API layer because we don't refer to the pointers within thesitereference, but we could easily forget this some day.
I think we can also remove some complexity in tsk_variant_restricted_copy because we can then avoid taking copies of the alleles in the user_alleles memory.
Also Python testing of the variant state is needs beefing up. We need to test taking copies of copies, among other things.
We could consider renaming this to tsk_variant_frozen_copy (keeping the current name as an alias)