SCARAP icon indicating copy to clipboard operation
SCARAP copied to clipboard

Remove redundant orthogroup gathering in hierarchical pan module

Open SWittouck opened this issue 1 year ago • 0 comments

The hierarchical version of the pan module currently gathers orthogroups of pseudogenomes twice: once in phase 2 (pseudogenome creation), without storing the actual pseudogenome sequences, and once in phase 3 (pseudopangenome inference) , where the actual pseudogenome sequences are stored.

The solution is to add an optional argument to the create_pseudogenome function to allow it to actually store the pseudogenome sequences (selected representatives) and to use this argument to already store the sequences in phase 2. The gathering of pseudogenome sequences (call of gather_orthogroup_sequences function) in phase 3 can then be removed.

The result will be a faster hierarchical pan module.

SWittouck avatar Apr 04 '23 08:04 SWittouck