Cassiopeia icon indicating copy to clipboard operation
Cassiopeia copied to clipboard

CRISPR-Cas9 distance correction solver

Open sprillo opened this issue 1 year ago • 3 comments

Implement distance correction scheme for the CRISPR-Cas9 model.

The class implementing this method is CRISPRCas9DistanceCorrectionSolver in the cassiopeia/solver/distance_correction/_crispr_cas9_distance_correction_solver.py module. (The distance_correction subpackage is meant to contain any distance correction methods that might be implemented in the future, possibly for models other than CRISPR-Cas9.)

The solver composes together four steps: (1) mutation proportion estimation, (2) collision probability estimation, (3) distance correction with the estimated mutation proportion and collision probability, and (4) tree topology reconstruction using the corrected distances. The solver is parameterized by these four steps. Note: Due to Numba compilation issues in the DistanceSolver, the function performing the third step is not injected into the solver, but rather determined by using a string identifier specifying the function name.

In the code, I declared some types to improve readability at select places. Underscores are used to denote functions or classes that are internal and are not exposed to the users, mostly to improve legibility.

Tests should be quite comprehensive. However, some tests are marked as slow since they require simulation (they take ~30s on my machine). To run the slow tests, use the --runslow flag, e.g.:python -m pytest test/solver_tests/distance_correction_tests --runslow. (In particular, CodeCov complains but coverage with the slow tests is good.)

sprillo avatar Aug 11 '23 22:08 sprillo