woltka icon indicating copy to clipboard operation
woltka copied to clipboard

Benchmarks of individual steps

Open qiyunzhu opened this issue 4 years ago • 2 comments

Used cProfile plus snakeviz to examine the time consumption of individual steps. It appears that:

  • Alignment file processing is expensive.
  • Classification is expensive.

rank = none

none

rank = free

free

rank = phylum,genus,species

fixed

qiyunzhu avatar Jun 21 '20 17:06 qiyunzhu

:+1:, and that visualization is quite nice. I think it may be worth seeing about improving the performance of parse_sam_line and assign_rank, and possibly find_lca. The first one could probably be quickly tuned with Cython. In the second, it looks like find_rank is expensive -- is the try/except used in that function something which regularly results in the exception? If yes, then that should be refactored, and use an explicit test of membership as triggering an exception is expensive (e.g., do rankdic.get(this) == rank instead of the try/except). And find_lca would also benefit from avoiding the use of try/except on the assumption that the exception block is being triggered regularly. Additionally, the call to .index is a list lookup, which if the list has more than a handful of elements, it might be faster to represent that structure as a dict. It also looks like get_lineage has a try/except, is it actually regularly getting triggered?

wasade avatar Jun 21 '20 18:06 wasade

@wasade Thank you for providing these valuable advices! I will look into individual suggestions and see if I can do something to increase the performance!

qiyunzhu avatar Jun 21 '20 21:06 qiyunzhu