woltka
woltka copied to clipboard
Benchmarks of individual steps
Used cProfile
plus snakeviz
to examine the time consumption of individual steps. It appears that:
- Alignment file processing is expensive.
- Classification is expensive.
rank = none
rank = free
rank = phylum,genus,species
:+1:, and that visualization is quite nice. I think it may be worth seeing about improving the performance of parse_sam_line
and assign_rank
, and possibly find_lca
. The first one could probably be quickly tuned with Cython. In the second, it looks like find_rank
is expensive -- is the try/except used in that function something which regularly results in the exception? If yes, then that should be refactored, and use an explicit test of membership as triggering an exception is expensive (e.g., do rankdic.get(this) == rank
instead of the try/except). And find_lca
would also benefit from avoiding the use of try/except on the assumption that the exception block is being triggered regularly. Additionally, the call to .index
is a list lookup, which if the list has more than a handful of elements, it might be faster to represent that structure as a dict
. It also looks like get_lineage
has a try/except, is it actually regularly getting triggered?
@wasade Thank you for providing these valuable advices! I will look into individual suggestions and see if I can do something to increase the performance!