khmer icon indicating copy to clipboard operation
khmer copied to clipboard

Measuring performance with and without insertion

Open standage opened this issue 8 years ago • 2 comments

Hi all, I'm trying to evaluate the performance of countgraph/counttable with and without actually inserting into the storage. To this end, I just opened branch bench/hash-no-insert where I've modified the code so that when add is called on a k-mer, it is hashed but not inserted. I'd like to see if we can get a feel for how much performance is being influenced by cache effects vs other factors.

I've posted a small benchmark script here, but the results for this new branch are indistinguishable from the master branch. I'm suspicious I may be missing something obvious here. Any first-blush ideas?

standage avatar Jul 31 '17 19:07 standage

Ok, I made some changes to storage.hh and it now behaves as expected: quicker runtime, all queries return 0.

$ # Branch "master"
$ python eval.py full.fq
...iteration 1
...iteration 2
...iteration 3
<class 'khmer.Counttable'> 21.292420864105225 22.92366623878479 22.203290780385334
<class 'khmer.Countgraph'> 9.10276198387146 11.320425033569336 9.87782096862793
$
$ # Branch "bench/hash-no-insert"
$ python eval.py full.fq
...iteration 1                                                                                                                              
...iteration 2                                                                                                                              
...iteration 3                                                                                                                              
<class 'khmer.Counttable'> 12.966645956039429 13.423659086227417 13.239591677983602                                                         
<class 'khmer.Countgraph'> 2.539900302886963 2.7193331718444824 2.630006790161133

standage avatar Aug 01 '17 18:08 standage

Nice.

Can we conclude that the parsing of the input data takes <2.5s from this? I think so.

The difference between table and graph is down to just the different hash function right?

betatim avatar Aug 09 '17 05:08 betatim