kevlar
kevlar copied to clipboard
Profiling results
At the moment kevlar is painfully slow. We've been philosophizing for a while now whether this was more likely due to the poor cache locality of khmer's Count-Min Sketch implementation or to slow Fastq parsing/handling, or some other factor or combination of factors. It's time to evaluate this empirically.
I profiled both kevlar count
and kevlar novel
on the new simulated data set I created in #83. It's small enough for kevlar to process in several minutes, but large enough to observe meaningful numbers in the profile. For kevlar novel
, I computed k-mer counts from scratch, not using precomputed counts.
Here are the top 25 functions, sorted by time spent, for each.
# kevlar count
ncalls tottime percall cumtime percall filename:lineno(function)
51921279 61.111 0.000 61.111 0.000 {method 'get' of '_khmer.KHashtable ' objects}
36053659 46.578 0.000 46.578 0.000 {method 'add' of '_khmer.KHashtable ' objects}
3 38.588 12.863 187.651 62.550 counting.py:22(load_sample_seqfile)
887492 29.514 0.000 29.514 0.000 {method 'get_kmers' of '_khmer.KHashtable ' objects}
887495 5.401 0.000 5.401 0.000 __init__.py:97(multi_file_iter_khmer) 887492 2.150 0.000 2.150 0.000 {method 'split' of '_sre.SRE_Pattern' objects}
1774984 1.703 0.000 6.031 0.000 __init__.py:103(clean_subseqs)
887819 1.123 0.000 1.157 0.000 re.py:278(_compile)
887492 0.905 0.000 4.177 0.000 re.py:195(split)
3 0.323 0.108 0.323 0.108 {method 'save' of '_khmer.KHashtable ' objects}
82/54 0.154 0.002 0.209 0.004 {built-in method _imp.create_dynamic}
896021/895669 0.153 0.000 0.153 0.000 {built-in method builtins.len}
257 0.101 0.000 0.101 0.000 {built-in method __new__ of type object at 0x10ffa3080}
522 0.050 0.000 0.050 0.000 {built-in method marshal.loads}
3 0.027 0.009 0.027 0.009 {method 'read' of '_io.BufferedReader' objects}
803/1 0.021 0.000 188.297 188.297 {built-in method builtins.exec}
2761 0.019 0.000 0.019 0.000 {built-in method posix.stat}
613/609 0.016 0.000 0.028 0.000 {built-in method builtins.__build_class__}
243 0.015 0.000 0.015 0.000 {built-in method builtins.compile}
1190 0.015 0.000 0.064 0.000 <frozen importlib._bootstrap_external>:1215(find_spec)
522 0.012 0.000 0.018 0.000 <frozen importlib._bootstrap_external>:816(get_data)
2 0.008 0.004 0.021 0.010 machar.py:116(_do_init)
6226 0.007 0.000 0.019 0.000 <frozen importlib._bootstrap_external>:50(_path_join)
6226 0.007 0.000 0.010 0.000 <frozen importlib._bootstrap_external>:52(<listcomp>)
704 0.007 0.000 0.087 0.000 <frozen importlib._bootstrap>:879(_find_spec)
# kevlar novel
ncalls tottime percall cumtime percall filename:lineno(function)
67446100 80.216 0.000 80.216 0.000 {method 'add' of '_khmer.KHashtable ' objects}
1183526 36.393 0.000 36.393 0.000 {method 'get_kmers' of '_khmer.KHashtable ' objects}
29633082 34.668 0.000 34.668 0.000 {method 'get' of '_khmer.KHashtable ' objects}
3 25.096 8.365 142.916 47.639 ./kevlar/counting.py:22(load_sample_seqfile)
22496672 18.973 0.000 54.691 0.000 ./kevlar/novel.py:23(kmer_is_interesting)
1 14.966 14.966 234.252 234.252 ./kevlar/novel.py:42(main)
887495 5.090 0.000 5.090 0.000 ./kevlar/__init__.py:97(multi_file_iter_khmer)
296035 3.894 0.000 10.359 0.000 /Users/standage/Projects/kevlar/env/lib/python3.5/site-packages/screed/fastq.py:14(fastq_iter)
887492 1.877 0.000 1.877 0.000 {method 'split' of '_sre.SRE_Pattern' objects}
1774984 1.387 0.000 4.978 0.000 ./kevlar/__init__.py:103(clean_subseqs)
1184137 1.336 0.000 1.980 0.000 {method 'readline' of '_io.BufferedReader' objects}
7784325 1.175 0.000 1.175 0.000 {method 'append' of 'list' objects}
1183855 1.153 0.000 1.187 0.000 /Users/standage/Projects/kevlar/env/lib/python3.5/re.py:278(_compile)
1184137 0.998 0.000 3.905 0.000 /usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/gzip.py:370(readline)
887492 0.753 0.000 3.462 0.000 /Users/standage/Projects/kevlar/env/lib/python3.5/re.py:195(split)
1184138 0.635 0.000 0.927 0.000 /usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/_compression.py:12(_check_not_closed)
296037 0.544 0.000 0.544 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
1184137 0.544 0.000 0.968 0.000 /Users/standage/Projects/kevlar/env/lib/python3.5/site-packages/screed/utils.py:4(to_str)
296034 0.506 0.000 0.625 0.000 /Users/standage/Projects/kevlar/env/lib/python3.5/site-packages/screed/screedRecord.py:23(__init__)
1184179 0.424 0.000 0.424 0.000 {method 'decode' of 'bytes' objects}
1186054 0.366 0.000 0.366 0.000 {method 'startswith' of 'str' objects}
2429318/2428966 0.359 0.000 0.359 0.000 {built-in method builtins.len}
8852 0.317 0.000 0.317 0.000 {method 'decompress' of 'zlib.Decompress' objects}
1184140 0.292 0.000 0.292 0.000 /usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/gzip.py:296(closed)
296037 0.289 0.000 1.154 0.000 /Users/standage/Projects/kevlar/env/lib/python3.5/re.py:170(search)
In both cases, the add
and get
functions for incrementing and querying k-mer counts from the Count-Min sketch dominate the runtime, with the get_kmers
and re.split
functions as heavy hitters as well. The latter two have to do with the fact that the khmer's bulk loading functions don't support the kind of preprocessing we need for kevlar, so I'm doing it in Python. This incurs overhead in sending the data from C++ to Python objects, and then doing the processing in Python.
As far as priorities, I'm not sure there's much we can do about the Count-Min sketch implementation. We could try to implement buffering (collect N≈1e4 add operations before actually incrementing the tables) but honestly once the CQF-based counter is integrated we may already get much better performance from that. For the sequence loading, I'll continue to lobby for better support for multiple pre-processing strategies with khmer bulk Fastq loading code.