SCARAP
SCARAP copied to clipboard
Remove redundant orthogroup gathering in hierarchical pan module
The hierarchical version of the pan module currently gathers orthogroups of pseudogenomes twice: once in phase 2 (pseudogenome creation), without storing the actual pseudogenome sequences, and once in phase 3 (pseudopangenome inference) , where the actual pseudogenome sequences are stored.
The solution is to add an optional argument to the create_pseudogenome function to allow it to actually store the pseudogenome sequences (selected representatives) and to use this argument to already store the sequences in phase 2. The gathering of pseudogenome sequences (call of gather_orthogroup_sequences function) in phase 3 can then be removed.
The result will be a faster hierarchical pan module.