hh-suite icon indicating copy to clipboard operation
hh-suite copied to clipboard

Use std::sort instead of QSortInt

Open fuji8 opened this issue 2 years ago • 4 comments

I profiled the T1050 with the parameters run by AlphaFold. (Only -cpu is changed). I used Score-P as the profiling tool and got the following results.

pr

From this image, we can see that QSortInt in mergeHitsToQuery is taking a long time. With fast enough storage, my hhblits for this condition is about 2100sec, and QSortInt accounts for about 40%.

Instead of this QsortInt, use std::sort.

I ran hhblits installed by conda and using std::sort under the same conditions as before. In order to avoid I/O effects, I analyze the difference in execution time between the logs that contain this change, instead of the overall execution time. (From https://github.com/soedinglab/hh-suite/blob/ac765987bd0daceb093a41a8e2850887dad5835f/src/hhblits.cpp#L1028-L1030 to https://github.com/soedinglab/hh-suite/blob/b100bb05fe6b920a7cbe58256eef72954d9088bf/src/hhhmm.cpp#L2337)

  conda use std::sort
iteration 1 1232(sec) 477(sec)
iteration 2 631(sec) 374(sec)
iteration 3 306(sec) 232(sec)

This reduced the execution time. I also ran it using the parallelization policy, but the results were not significantly different from std::sort.

This change is due to the different stability of sort, so the execution results may not truly match.

fuji8 avatar Jan 21 '22 01:01 fuji8

Cool, thank you!

We have implemented a similar fix in MMseqs2's version of the same code, but haven’t backported it: https://github.com/soedinglab/MMseqs2/blob/d89fcecf9911a99c45ed81c1c0e5054743debc64/src/alignment/MsaFilter.cpp#L212

Could you repeat the benchmark with a stable sort?

milot-mirdita avatar Jan 21 '22 02:01 milot-mirdita

Thank you for the reply.

I changed the sort to stable_sort and ran it 3 times on hpc in the following environment.

  • cpu: 28 cores
  • RAM: 235GB
  1 2 3
iteration 1 488(sec) 484(sec) 484(sec)
iteration 2 374(sec) 371(sec) 376(sec)
iteration 3 223(sec) 225(sec) 218(sec)

Because of the large memory, the computational complexity is probably Nlog(N).

fuji8 avatar Jan 21 '22 07:01 fuji8

@fuji8 This looks great! Thank you for the PR. Would it be possible to avoid the lambda expression in the sort?

martin-steinegger avatar Jan 23 '22 18:01 martin-steinegger

I apologize for the delay in response.

I rewrote the code to be almost equivalent without using the lambda expression. I ran it only once, just to be sure.

  no lambda
iteration 1 500(sec)
iteration 2 385(sec)
iteration 3 232(sec)

fuji8 avatar Jan 31 '22 22:01 fuji8