PCG
PCG copied to clipboard
Reimplement topological representation
The topological representation should be updated to better represent the phylogenetic forest. Currently we represent phylogenetic DAGs in a newtyped non-empty list to represent a phylogenetic forest. This make it difficult to represent data structure invariant that apply to a phylogenetic forest. Specifically, that each leaf should appear in the phylogenetic forest exactly once. Instead we should make the phylogenetic forest the "primitive" representation.
We should have the leaf nodes places in indices [0..n-1]
of the vector with internal nodes occupying indices [n..len-1]
. Once allocated, the leaf nodes should never have their indices altered. This will allow for more efficient bit-vector representation in the resolution cache and extracting the leaf set, amongst other things.
This representation, where the phylogenetic forest is the primitive representation, will allow for more natural implementation of breaking a phylogenetic forest "component" on an edge, since the result will necessarily be a forest. It will also allow for more natural calculation of multi rooting cost.
A lot of traversal code will need to be re-implemented during this change. Afterwards, the interface should be stable and highlevel. There might be more opportunities for efficiency gains by using the foldl
or monad-memo
libraries in the tree traversals to build the updated vector and character matrix when using this representation.