NetRAX
NetRAX copied to clipboard
Explanation of the current Experimental Results CSV Header
The current results CSV header consists of:
name | n_taxa | n_trees | n_reticulations | msa_size | sampling_type | simulation_type | likelihood_type | timeout | n_random_start_networks | n_parsimony_start_networks | start_from_raxml | celine_params | n_reticulations_inferred | bic_true | logl_true | bic_inferred | logl_inferred | bic_raxml | logl_raxml | rf_absolute_raxml | rf_relative_raxml | rf_absolute_inferred | rf_relative_inferred | near_zero_branches_raxml | hardwired_cluster_distance | softwired_cluster_distance | displayed_trees_distance | tripartition_distance | nested_labels_distance | path_multiplicity_distance | runtime_inference
- name: The name of the dataset.
- n_taxa: Number of taxa in the simulated network.
- n_trees: Number of displayed trees in the simulated network (it is 2^n_reticulations).
- n_reticulations: Number of reticulations in the simulated network.
- msa_size: Total MSA size (there might be up to n_trees sites more due to rounding issues, depending on the chosen sampling type).
- sampling_type: The sampling type used. It is one of
class SamplingType(Enum):
STANDARD = 1 # randomly choose which tree to sample, then sample equal number of sites for each sampled tree - this is the only mode that uses the n_trees or m parameter for sampling
PERFECT_SAMPLING = 2 # sample each displayed tree, and as many site as expected by the tree probability
PERFECT_UNIFORM_SAMPLING = 3 # sample each displayed tree, with the same number of sites per tree (ignoring reticulation probabilities)
SINGLE_SITE_SAMPLING = 4 # sample each site individually, with the reticulation probabilities in mind
- simulation_type: It is one of
class SimulationType(Enum):
CELINE = 1 # use Celine's network topology simulator
SARAH = 2 # use Sarah's ad-hoc network topology generator
- likelihood_type: It is one of
class LikelihoodType(Enum):
AVERAGE = 1 # use weighted average of displayed trees
BEST = 2 # use best displayed tree
- timeout: If no start network is specified and no number of random/parsimony start networks are specified, then a value larger than 0 means that NetRAX will continue searching from new random start networks until $(timeout) seconds have passed.
- n_random_start_networks: Number of random start trees for the NetRAX network search
- n_parsimony_start_networks: Number of parsimony start trees for the NetRAX network search
- start_from_raxml: If TRUE, run NetRAX search only from best ML tree inferred by raxml-ng. If FALSE, run NetRAX search from some random/parsimony start trees.
- celine_params: The parameters used by Celine's simulator, or empty otherwise.
- n_reticulations_inferred: Number of reticulations in the network inferred by NetRAX.
- bic_true: BIC score of the simulated network (using the specified likelihood_type).
- logl_true: Network loglikelihood score of the simulated network (using the specified likelihood_type).
- bic_inferred: BIC score of the network inferred by NetRAX (using the specified likelihood_type).
- logl_inferred: Network loglikelihood score of the network inferred by NetRAX (using the specified likelihood_type).
- bic_raxml: BIC score of the maximum likelihood tree inferred by raxml-ng (using the specified likelihood_type).
- logl_raxml: Network loglikelihood score of the maximum likelihood tree inferred by raxml-ng (using the specified likelihood_type).
- rf_absolute_raxml: Absolute RF distance between the maximum likelihood tree inferred by raxml-ng and the simulated network, if the simulated network has zero reticulations. Otherwise, this value is -1.
- rf_relative_raxml: Relative RF distance between the maximum likelihood tree inferred by raxml-ng and the simulated network, if the simulated network has zero reticulations. Otherwise, this value is -1.
- rf_absolute_inferred: Absolute RF distance between the network inferred by NetRAX and the simulated network, if both the simulated network and the network inferred by NetRAX have zero reticulations. Otherwise, this value is -1.
- rf_relative_inferred: Relative RF distance between the network inferred by NetRAX and the simulated network, if both the simulated network and the network inferred by NetRAX have zero reticulations. Otherwise, this value is -1.
- near_zero_branches_raxml: Number of near-zero branches in the maximum likelihood tree inferred by raxml-ng.
- runtime_inference: Elapsed runtime in seconds for the network inference with NetRAX.
- hardwired_cluster_distance: Hardwired cluster distance between the simulated network and the network inferred by NetRAX (computed via Dendroscope).
- softwired_cluster_distance: Softwired cluster distance between the simulated network and the network inferred by NetRAX (computed via Dendroscope).
- displayed_trees_distance: Displayed trees distance between the simulated network and the network inferred by NetRAX (computed via Dendroscope).
- tripartition_distance: Tripartition distance between the simulated network and the network inferred by NetRAX (computed via Dendroscope).
- nested_labels_distance: Nested labels distance between the simulated network and the network inferred by NetRAX (computed via Dendroscope).
- path_multiplicity_distance: Path multiplicity distance between the simulated network and the network inferred by NetRAX (computed via Dendroscope).
Do we need to report anything else?